Home » Find and Delete Duplicates From a Table in SQL Server

Find and Delete Duplicates From a Table in SQL Server

by Online Tutorials Library

Find and Delete Duplicates From a Table in SQL Server

Many times, in SQL Server databases there are tables that hold repetitive or the same data. Sometimes this repetitive date needs to remove from the table to reduce the redundancy of the whole database. So, in this article, we are going to see how to find and remove duplicate rows from a table in SQL Server.

The various steps that we need to follow in order to find and remove duplicate rows from a table in SQL Server are:

  • The first step is to find the duplicate rows with the help of the GROUP BY clause, or we can also use the ROW_NUMBER() function.
  • Once we have found the duplicate rows successfully, the next step is to use the DELETE statement to remove the duplicate rows.

As discussed above, the first step is to identify or find the duplicates rows in the table that we want to delete for this, we need to perform some operations, and those operations are:

  • The first operation or step is to define criteria for the duplicates rows in a table that will distinguish that particular duplicate row from the other rows that are present in the same table of the database.
  • Now the next step is to write a query that will use the criteria that we have defined in the first step to distinguish the duplicate row in a table.

In this article, for finding the duplicate rows in a table, we will be using the Common Table Expression, also known as CTE. With the help of Common Table Expression or CTE, we can define a temporary named result set that will remain available till the execution of a statement, and that statement can be any statement like SELECT, INSERT, UPDATE, DELETE, or MERGE.

The syntax for a Common Table Expression or CTE in SQL Server is:

In the syntax written above:

  • Name_of_expression: The name_of_expression represents the name that we want to assign to our Common Table Expression or CTE. And this name is referred to later in the SQL statement in order to use this Common Table Expression or CTE.
  • Name_of_column: The name_of_column represents the column or columns on which the Common Table Expression or CTE will be defined.
  • definition_of_CTE: The definition_of_CTE represents the part where the actual definition of the Common Table Expression or CTE will be written.
  • SQL_statement: The SQL_statement represents the SQL statement for which that particular Common Table Expression or CTE is written and that Common Table Expression or CTE will remain valid till the execution of this SQL statement, such as SELECT INSERT, UPDATE, DELETE, or MERGE.

So, now we have an idea about what Common Table Expression or CTE is. We will also discuss Common Table Expression or CTE later in this article also. Now let us see how we will find the duplicate rows in a table. Let us take an example into consideration for understanding the usage of the Common Table Expression or CTE and then using it for finding and removing the duplicate entries in the table.

Let us create a table named students having three columns, namely studID, rollno, and Name having datatypes as varchar, int, and varchar, respectively. The syntax for creating a table with the schema mentioned above using the CREATE TABLE command is:

Output:

Find and Delete Duplicates From a Table in SQL Server

As shown in the image, we have successfully created a table having the schema mentioned above that is three columns, two having data type as varchar and one as int.

Now, let us add some data to the students’ table. The syntax of the INSERT command in SQL Server to add data in the students table will be:

Output:

studID     rollNo      Name                            ---------- ----------- ------------------------------  101              87459 Andrew   102              54771 Samuel   103              22100 Nirnay   104              94365 Paul   105              35479 Casey   106              74000 Martin   107              10211 Raphel   105              35479 Casey   103              22100 Nirnay   101              87459 Andrew  

As we can see, ten rows of data in the students table are added successfully, and we can see the same as the output of the SELECT query.

We have created our table and added data to the table successfully, and the next step is to find the duplicates entries in the students table using Common Table Expression or CTE. So, we will be creating a Common Table Expression or CTE for displaying all the duplicate rows that are present in the students table. The syntax for creating a Common Table Expression or CTE with the operation mentioned above is:

Output:

studID     rollNo      Name                            ---------- ----------- ------------------------------  101              87459 Andrew  101              87459 Andrew  103              22100 Nirnay  103              22100 Nirnay  105              35479 Casey  105              35479 Casey  

In the query written above, we have created a Common Table Expression or CTE named dup_cte, which counts the occurrences of each row in the students table with the use of the COUNT() function and will return only those rows of data from the table whose count is greater than one or we can also say that return only the duplicate rows or the rows that have multiple occurrences in the table. This data of the repetitive rows provided by the Common Table Expression or CTE is then used in the SELECT command to display all the repetitive rows of the students table. Then an INNER JOIN is applied between the students table and the rows provided by the Common Table Expression or CTE named dup_cte. And the final result is displayed here using the SELECT command in the SQL command section of the Common Table Expression or CTE. We can use any command other than the SELECT command to delete or update the data in this SQL command section of the Common Table Expression or CTE according to our need.

So now we have successfully identified or found the duplicate rows in the table that we want to delete, so the next step is to delete those duplicate rows from the table. The syntax for deleting duplicate rows from the students table using the Common Table Expression is:

Output:

studID     rollNo      Name                            ---------- ----------- ------------------------------  101              87459 Andrew  102              54771 Samuel  103              22100 Nirnay  104              94365 Paul  105              35479 Casey  106              74000 Martin  107              10211 Raphel  

As we can see in the output of the above query, all the duplicate rows in the student table have been removed successfully. In this query, we have used the ROW_NUMBER() function instead of the COUNT() function to get the duplicate rows in the students table. The Common Table Expression using the ROW_NUMBER() function produced this output as a result of the Common Table Expression or CTE named dup_cte:

studID     rollNo      Name                           row_num               ---------- ----------- ------------------------------ --------------------  101              87459 Andrew                                            1  101              87459 Andrew                                            2  102              54771 Samuel                                            1  103              22100 Nirnay                                            1  103              22100 Nirnay                                            2  104              94365 Paul                                              1  105              35479 Casey                                             1  105              35479 Casey                                             2  106              74000 Martin                                            1  107              10211 Raphel                                            1  

As we can see in the above output, each row has its occurrence number or row number displayed under the row_num named column. So, we used the output produced by the dup_cte named Common Table Expression and deleted only those rows whose row_num is greater than one. Subsequently, that will only delete those rows that appeared more than once in the table. Thus, removing the duplicate rows from the table.

So, with the help of this article, we are able to successfully find and delete duplicates from a table in SQL Server with the help of Common Table Expression or CTE, COUNT() Function, and ROW_COUNT() function.


You may also like