Deleting Duplicate Rows in a SQL Server Table Using a CTE

Finding duplicate rows in a table is quite simple. Deleting only the duplicates – as opposed to deleting all rows that have duplicates – is a bit more complicated.

I have a table variable called “@Readings” with the following schema:

DECLARE @Readings TABLE
(
	TagName VARCHAR(100),
	ScanTime DATETIME,
	TagValue NUMERIC(10,3)
)

First, I load the table with data from another database on a linked server called “OtherDB”:

INSERT INTO @Readings(TagName,ScanTime,TagValue)
SELECT DISTINCT tag,[time],value
	FROM
	OPENQUERY(OtherDB,'select tag, time, value from archive.comp where (tag LIKE ''%test%'') AND time = DATE(''*'') ')

Then, I load that same data with some updates, both to the ScanTime column, and then to the TagValue column – always holding the TagName column constant.

DECLARE @DateTimeStamp DATETIME
SET @DateTimeStamp = GETDATE()

INSERT INTO @Readings(TagName,ScanTime,TagValue)
SELECT TagName,@DateTimeStamp,TagValue FROM @Readings

INSERT INTO @Readings(TagName,ScanTime,TagValue)
SELECT TagName,ScanTime,12345 FROM @Readings

To see what rows duplicate the TagName column alone, this query will suffice:

SELECT TagName FROM @PI_Readings GROUP BY TagName HAVING ( COUNT(*) > 1 )

By adding column names, you can look where the duplication is the TagName and the ScanTime:

SELECT TagName, ScanTime FROM @PI_Readings GROUP BY TagName, ScanTime HAVING ( COUNT(*) > 1 )

(If I were to add also the TagValue column, I would get zero rows returned, since there are currently no duplicates where all three columns are identical.)

So, what if you wanted to delete only the duplicates where TagName and ScanTime matched, irrespective of the TagValue?

The easiest way I have found is to use what is called a Common Table Expression, or CTE.

To use a CTE to delete these duplicates let’s first construct it to only show the duplicates:

WITH CTE AS
(
	SELECT TagName, ScanTime, TagValue,
	RN = ROW_NUMBER()OVER(PARTITION BY TagName, ScanTime ORDER BY TagName, ScanTime)
	FROM @Readings
)
SELECT * FROM CTE WHERE RN > 1

To now delete the offending rows, change the last line from a SELECT statement to a DELETE statement:

When creating the CTE statement, make sure there is a semicolon terminating the previous statement. This is not usually required in SQL Server, but it is in this case.

Another good example of this is on Stack Overflow.

While I do like SQL Fiddle, it seems that the DELETE function does not work on CTEs there, though SELECT statements do.

Deleting Duplicate Rows in a SQL Server Table Using a CTE

Like this:

Related

Leave a ReplyCancel reply

Share this:

Like this:

Related

Leave a ReplyCancel reply