Comparing Multiple Patterns with LIKE Operator

pic of Facebook Like button

I really like the LIKE operator! (See what I did there?)

I use it on a daily basis. It does have one frustrating shortcoming, however. Unlike the IN operator (where a list is used for comparison), you can only compare one pattern with each LIKE statement.

If you have only a few patterns, this isn’t too much of a problem.

SELECT * FROM tbl_ThatTableOverThere
WHERE ThisColumn LIKE '%pattern1%'
OR ThisColumn LIKE '%pattern2%'
OR ThisColumn LIKE '%pattern3%'
-- [...]
OR ThisColumn LIKE '%patternX%';

But what if X is a very large number? Hundreds? Thousands? Maybe even millions?

I wrote the code below to allow for multiple patterns to be compared. The patterns are inserted into a table variable (@patterns) and are selected using a fast forward cursor. Unless the number of patterns is truly astronomical, this should suffice.

USE DatabaseName
DECLARE @random INT, @upper INT, @lower INT, @randomTable NVARCHAR(20)
DECLARE @sourceTable NVARCHAR(255)
DECLARE @columnName NVARCHAR(255)
DECLARE @pattern NVARCHAR(255)
DECLARE @patterns TABLE (pattern NVARCHAR(255) NOT NULL)

/* These variables should remain unchanged unless the name of the 
temporary table needs to have different text or range of random numbers
for the suffix of the temp table. 
[Reference:] */
SET @lower = 1
SET @upper = 9999999
SELECT @random = ROUND(((@upper - @lower) * RAND() + @lower),0)
SELECT @randomTable = '##search'+CAST(@random AS NVARCHAR(20))+''

-- Name of source table holding data for comparison
SET @sourceTable = 'tbl_Name'

-- Column that is being compared with patterns using LIKE operator
SET @columnName = 'ColumnName'

-- Additional WHERE clause may be specified here
SET @whereClause = '' -- e.g., 'AND AnyColumnName LIKE ''%b%''' 

-- Insert patterns into table variable below
-- ,('%pattern2%')
-- ,(%pattern3%) and so on

-- Deletes temp table if it already exists
SET @deleteTempTable = 'IF object_id(''tempdb..' + @randomTable + ''') IS NOT NULL DROP TABLE ' + @randomTable

-- This line creates the temporary table using the same schema as the source table without inserting data.
-- The UNION ALL operator prevents the IDENTITY property from being applied to any of the columns in the temp table.
-- [Reference:]
EXEC('SELECT TOP 0 * INTO ' + @randomTable + ' FROM ' + @sourceTable + ' UNION ALL SELECT TOP 0 * FROM ' + @sourceTable)

SET @SQL = 'INSERT INTO ' + @randomTable + CHAR(13) + 'SELECT TOP 0 * FROM ' + @sourceTable

SELECT pattern FROM @patterns

OPEN patternsCursor
FETCH NEXT FROM patternsCursor INTO @pattern

	IF LEN(@pattern) > 0
		--Note: UNION operator (as opposed to UNION ALL) is used here to prevent duplicate rows from being inserted into table variable
		SET @SQL = @SQL + CHAR(13) + 'UNION' + CHAR(13) + 'SELECT * FROM ' + @sourceTable + CHAR(13) + 'WHERE ' + @columnName + ' LIKE ''' + @pattern + ''''
		IF LEN(@whereClause) > 0 SET @SQL = @SQL + CHAR(13) + @whereClause

	FETCH NEXT FROM patternsCursor INTO @pattern
CLOSE patternsCursor
DEALLOCATE patternsCursor
SET @SQL = @SQL + ';'

-- Prints final SQL statement on Results tab
-- Executes SQL Statement showing number of rows inserted on Messages tab

-- Executes SQL Statement showing rows in Results tab
EXEC('SELECT * FROM ' + @randomTable + ';')

-- Deletes temp table

Deleting Duplicate Rows in a SQL Server Table Using a CTE

CTE example

Finding duplicate rows in a table is quite simple. Deleting only the duplicates – as opposed to deleting all rows that have duplicates – is a bit more complicated.

I have a table variable called “@Readings” with the following schema:

	TagName VARCHAR(100),
	TagValue NUMERIC(10,3)

First, I load the table with data from another database on a linked server called “OtherDB”:

INSERT INTO @Readings(TagName,ScanTime,TagValue)
SELECT DISTINCT tag,[time],value
	OPENQUERY(OtherDB,'select tag, time, value from archive.comp where (tag LIKE ''%test%'') AND time = DATE(''*'') ')

Then, I load that same data with some updates, both to the ScanTime column, and then to the TagValue column – always holding the TagName column constant.

SET @DateTimeStamp = GETDATE()

INSERT INTO @Readings(TagName,ScanTime,TagValue)
SELECT TagName,@DateTimeStamp,TagValue FROM @Readings

INSERT INTO @Readings(TagName,ScanTime,TagValue)
SELECT TagName,ScanTime,12345 FROM @Readings

To see what rows duplicate the TagName column alone, this query will suffice:

SELECT TagName FROM @PI_Readings GROUP BY TagName HAVING ( COUNT(*) > 1 )

By adding column names, you can look where the duplication is the TagName and the ScanTime:

SELECT TagName, ScanTime FROM @PI_Readings GROUP BY TagName, ScanTime HAVING ( COUNT(*) > 1 )

(If I were to add also the TagValue column, I would get zero rows returned, since there are currently no duplicates where all three columns are identical.)

So, what if you wanted to delete only the duplicates where TagName and ScanTime matched, irrespective of the TagValue?

The easiest way I have found is to use what is called a Common Table Expression, or CTE.

To use a CTE to delete these duplicates let’s first construct it to only show the duplicates:

	SELECT TagName, ScanTime, TagValue,
	FROM @Readings

To now delete the offending rows, change the last line from a SELECT statement to a DELETE statement:

CTE example

When creating the CTE statement, make sure there is a semicolon terminating the previous statement. This is not usually required in SQL Server, but it is in this case.

Another good example of this is on Stack Overflow.

While I do like SQL Fiddle, it seems that the DELETE function does not work on CTEs there, though SELECT statements do.

Using MAX() in a SQL Subquery

database diagram showing table relationships

One pro bono project I’m working on is improving a school website where parents can sign up for classes, view students’ grades, etc. One of the problems with the website was that the list of user accounts, which includes all parents and teachers, includes parents from previous years who no longer have students there.

The query for pulling this information was very simple:

SELECT * FROM UserAccounts ORDER BY lastName ASC

This query pulled all users and ordered them only by last name. The data was being dumped into an ASP.NET GridView with column sorting, and there were only a couple hundred people there, so it wasn’t completely unmanageable. However, dealing with all the parents who no longer have kids there did make visual searches more difficult. The request was to bring all current parents to the top of the list so that other admins don’t have to search through pages of people who no longer attend.

A quick fix for this would be to sort the list based on last academic year attended. We’ll do this by employing the MAX() function.

In addition to the UserAccounts table, there are other tables called “students” (which includes all current and past students) and “schedule” (which holds student schedules). In the schedule table, a column called “s_year” that holds the academic year (in the format of “2017-18”). We can join these tables based on user account IDs in the UserAccount table, the parentID column in the students table, and the student id (s_id) column in the schedule table.

One more thing: Since this list also includes people who have never had students there, the revised query will have to take that into account.

database diagram showing table relationships

The below query will not only solve this problem, but will also sort by first and last name. Users with no academic year (a NULL value) will be at the bottom of the list.

SELECT DISTINCT ua.*, sc.s_year FROM UserAccounts ua 
LEFT OUTER JOIN students s ON s.parentID = ua.ID 
LEFT OUTER JOIN schedule sc ON sc.s_id = s.ID 
WHERE sc.s_year IS NULL 
OR sc.s_year = (
   SELECT MAX(sc2.s_year) FROM schedule sc2 WHERE sc2.s_id = s.ID
ORDER BY sc.s_year DESC, ua.lastName, ua.firstName

The only remaining thing to do is to add the year column to the GridView and make sure that sorting is enabled.

Solving the ROW-00060 Error When Using SSIS with Oracle

Attunity Oracle Source in SSIS

I have built SSIS packages to load data from one Oracle database to another before, but never had I come across this error until recently:

“ROW-00060: Internal error: [dainsert,16] Source: Oracle Destination: Oracle Error Occurred @ after 403k records.”

Initially, I thought that some constraint had been violated on the destination data source – like an attempt at a NULL value being inserted into a NOT NULL column. I ran the package a few times and noticed that the row count was always in the neighborhood of 400k and that the package had run for hours. I thought it odd that so many rows could be inserted with no problem, and so a double- and triple-checked to make certain that the constraints on the source columns were the same as those on the destination ones.

Though there were no clear cut solutions explaining exactly why this error happens, several different people on different fora mentioned that they had overcome this particular error (at different row counts, not always 400k!) by using a connector from Attunity rather than the one that comes with the Oracle Client. Fortunately, this driver is made to work with SSIS and is distributed by Microsoft. This driver not only works with Oracle, but also with Teradata. Choose the version based on the version of SSIS you are using.

Version 2.0 for SQL 2012
Version 3.0 for SQL 2014
Version 4.0 for SQL 2016

After installing the connector into Visual Studio and restarting VS, you should be able to use it by selecting “Oracle Source” and/or “Oracle Destination” from the SSIS Toolbox.

Attunity Oracle Source in SSIS

After setting up the new source and destination connections, I was able to run the SSIS package to completion, loading over five million rows of data in less than an hour!