2

How to remove duplicate rows (records) from a table in a SQL

Posted by LordFury on June 13, 2011 in SQL Tutorials

Before, I dive into examples I just wanted to ask everyone who is reading this article, how many times did you run a while loop or a cursor? Perhaps you worked on the procedure that inserted records into the table multiple times and the table had no constraints or maybe you accidentally inserted the same record twice? Whatever the case will be if you have worked in SQL SERVER even at the beginner level, chances are you have met this problem. I recently was migrating the large amount of permissions data for a security program and met that issue. Right away I went to my #1 helper google and found crapload of solutions. But of course some are too simple, while others are too specific, so after figuring this stuff out I decided to write an easy to follow guide for everyone beginners –> advanced that is easy to follow and eliminate the hassle of constantly browsing for a solution to this annoying problem. So lets dive in!

EXAMPLE #1 ( A FEW sets of DUPLICATES)Lets assume you are using SQL SERVER 2000 where there was no forced primary key and so you don’t define a primary key and insert the duplicates. For the sake of example I’ll create a simple table with dups like this:

[code lang="sql"]
DECLARE @Employee TABLE (firstName Varchar(20), lastName Varchar(50), Occupation varchar (50))
INSERT INTO @Employee Values ( 'Slim' ,'Jim', 'Bum' )INSERT INTO @Employee Values ( 'Brad' ,'Pitt', 'Actor' )
INSERT INTO @Employee Values ( 'Brad' ,'Pitt', 'Actor' )INSERT INTO @Employee Values ( 'Yuriy' ,'Nagorny', 'A. N. Content Provider' )
INSERT INTO @Employee Values ( 'Steven' ,'Spielberg', 'Executive Producer/Director' )
INSERT INTO @Employee Values ( 'Steven' ,'Spielberg', 'Executive Producer/Director' )
[/code]

First of all, How do we look for duplicates? That can be accomplished with this easy query.
Select firstName, lastName, Occupation,COUNT(*) AS duplicateCount FROM @EmployeeGroup by firstName,lastName, OccupationHaving Count(*) > 1
This basically says show me the data , and to the right of it add another column that displays the count. Do this for all the records that occur more than once!
SO HOW DO WE DELETE THE DUPLICATES its simple.First we set the row count like this:
SET ROWCOUNT 1 –this statement means that whatever happens it will happen to a maximum of one row so that when we write our delete statement it will delete one duplicate row and leave the other one in the table

[code lang="sql"]
DELETE FROM @Employee WHERE firstName = 'Steven' AND lastName='Spielberg'
SET ROWCOUNT 0 -- coming back to life :)
[/code]

setting things to the way the were, so that when we do our select *from @ Employee we see all the rows.
Now lets say for instance that we had 5 ‘Steven Spielbergs’ how do we delete 4 and leave 1 in the table? Simple!!!Set the rowcount to 4 and run the delete statement. Then it will delete 4 rows and leave 1 row.

EXAMPLE #2 (many sets of duplicates)

[code lang="sql"]
DECLARE @Employee TABLE (firstName Varchar(20), lastName Varchar(50),Occupation varchar (50))
INSERT INTO @Employee Values ( 'Slim' ,'Jim', 'Bum' )
INSERT INTO @Employee Values ( 'Brad' ,'Pitt', 'Actor' )
INSERT INTO @Employee Values ( 'Brad' ,'Pitt', 'Actor' )
INSERT INTO @Employee Values ( 'Yuriy' ,'Nagorny', 'A. N. Content Provider' )
INSERT INTO @Employee Values ( 'Steven' ,'Spielberg', 'Executive Producer/Director' )
INSERT INTO @Employee Values ( 'Steven' ,'Spielberg', 'Executive Producer/Director' )
... ... ...
[/code]

The best approach to delete the duplicates in the case where you have hundreds of them is this:
1) select the distinct dups into a temporary table (just like we identify them)
2) Remove all the records from the original table that match the records that we selected into the temporary table.
3) insert unique records from the temporary table into the original table
4) Grab pizza and party 🙂

[code lang="sql"]
1) 
SELECT firstName, lastName, Occupation INTO #duplicates 
FROM @Employee
GROUP BY firstName, lastName,Occupation
HAVING COUNT(*) > 1

2)
-- delete all rows that are duplicated
DELETE FROM @Employee
FROM @Employee o 
INNER JOIN #duplicates d ON d.firstName= o.firstName and d.lastName=o.lastName and d.Occupation=o.Occupation

3)
-- insert one row for every duplicate set
INSERT INTO @Employee (firstName, lastName,Occupation)
SELECT firstName, lastName, Occupation 
FROM #duplicates
[/code]

4)

Happy man eating pizza

EXAMPLE #3
Deleting the table
You can also do the following.
Create an actual object table not a table in memory for instance:

[code lang="sql"]
CREATE TABLE DuplicatesTable(dataRows VARCHAR(50))
INSERT INTO DuplicatesTable VALUES ('Unique Row')
INSERT INTO DuplicatesTable VALUES ('duplicate row1')
INSERT INTO DuplicatesTable VALUES ('duplicate row1')
INSERT INTO DuplicatesTable VALUES ('duplicate row1')
INSERT INTO DuplicatesTable VALUES ('duplicate row2')
INSERT INTO DuplicatesTable VALUES ('duplicate row2')
INSERT INTO DuplicatesTable VALUES ('Different Unique Row')
INSERT INTO DuplicatesTable VALUES ('Another Unique Row')
INSERT INTO DuplicatesTable VALUES ('duplicate row3')
INSERT INTO DuplicatesTable VALUES ('duplicate row3')
INSERT INTO DuplicatesTable VALUES ('duplicate row3')
INSERT INTO DuplicatesTable VALUES ('duplicate row3')
INSERT INTO DuplicatesTable VALUES ('duplicate row3')
[/code]

NOW SELECT DISTINCT rows from our table into a brand new permanent object table like so

[code lang="sql"]
SELECT DISTINCT dataRows
INTO tmpNonDuplicateTableFROM DuplicatesTable
[/code]

Afterwards erase everything from the table that has duplicates and drop it!
[code lang=”sql”]
TRUNCATE TABLE DuplicatesTable DROP TABLE DuplicatesTable
[/code]
We are almost done, finally we need to go to the object explorer in the sql server management studio and rename our
tmpNonDuplicateTable to DuplicatesTable or simply execute the following:

[code lang="sql"]
EXEC sp_rename 'tmpNonDuplicateTable ', 'DuplicatesTable '
[/code]

There are several other ways to remove duplicates using a CURSOR or a WHILE loop but I don’t like those methods because they take way too long and occupy a bunch of memory!
Now I hope this article helps someone. Feel free to write comments and ask me questions on anything that is unclear or if you have a specific problem

Tags: , , , , , , , , , , , , , ,

Copyright © 2011-2024 Fury Training All rights reserved.
Desk Mess Mirrored version 1.9.1 theme from BuyNowShop.com.