5 Min Read

17 October 2025

How would you write a query to find duplicates in a table?

Let’s imagine you’re working at a company, and you've just received a new database that contains customer information. You’re excited to get started, but as you dig deeper into the data, you realize there’s a problem. Some of the customer records are duplicated. It’s not just one or two duplicates here and there; the problem is widespread. This is a common issue that many database administrators, data analysts, and developers face: duplicates in a database table. The worst part? If you don’t handle this problem, it could lead to inaccurate reports, skewed insights, and wasted resources.

Exploring a career in Data Analytics? Apply Now!

So, what do you do? You don’t panic. You write a SQL query to find duplicates in your table. SQL makes this process easier than you might think. Whether you're a seasoned database pro or just starting your journey with SQL, this blog will guide you through the process of finding and handling duplicates in your database tables.

What Does It Mean to Have Duplicates in a Table?

Before diving into the query-writing part, let’s take a step back and understand what we mean by “duplicates.” In the world of databases, a duplicate refers to two or more rows in a table where the values in one or more columns are identical. This could mean that a customer's information—such as their name, email address, or phone number—appears multiple times in the database.

Imagine a table like this:

Customer_ID	Name	Email
1	John Doe	john.doe@example.com
2	Jane Smith	jane.smith@example.com
3	John Doe	john.doe@example.com
4	Mark Lee	mark.lee@example.com
5	John Doe	john.doe@example.com

Here, John Doe has multiple records with the same email, which means duplicates. It’s important to note that in some cases, duplicates are not easily visible. They may exist in multiple columns, and may need a deeper query to detect them.

Why Is Finding Duplicates So Important?

Duplicates aren’t just a minor inconvenience; they can be a serious problem in data management. For one, they can skew your reports and analysis. Imagine running an analysis of customer behavior or sending out an email campaign, only to realize that some customers were counted multiple times due to duplicate entries. Not only does this waste time, but it also compromises the quality of your work.

Moreover, duplicates can slow down your queries and data retrieval process. The more redundant data there is, the more time it takes for SQL to process the data, especially when dealing with large datasets. This means your database performance takes a hit.

Simply put, finding and eliminating duplicates helps maintain data integrity, improve performance, and ensure accuracy in your reports and analysis.

SQL Query to Find Duplicates

Now that you understand the problem, let's move on to how you can actually write a SQL query to find duplicates in your table.

In SQL, one of the simplest ways to find duplicates is by using the GROUP BY clause. This clause allows you to group rows by one or more columns and apply aggregate functions like COUNT() to count how many times each value appears in the table. Here's the basic query:

SELECT Name, COUNT(*) AS count FROM customers GROUP BY Name HAVING COUNT(*) > 1;

Explanation of the Query:

SELECT Name, COUNT(*) AS count: This part tells SQL that we want to select the Name column and count how many times each name appears in the table.
FROM customers: This indicates the table we are working with (in this case, the customers table).
GROUP BY Name: The GROUP BY clause groups the rows based on the Name column, so that we can count how many times each name appears.
HAVING COUNT(*) > 1: The HAVING clause is like the WHERE clause, but it’s used to filter grouped rows. In this case, it filters out the names that appear only once, returning only the duplicate entries (those names that appear more than once).

Output Example:

Name | count ----------|------- John Doe | 3

The result shows that the name "John Doe" appears 3 times in the table, indicating that there are 2 duplicates.

Finding Duplicates Across Multiple Columns

What if the duplicates aren’t based on just one column? In some cases, you might want to find duplicates based on a combination of columns. Let’s say you want to find rows where both the Name and Email columns are duplicated. You can modify the query like this:

SELECT Name, Email, COUNT(*) AS count FROM customers GROUP BY Name, Email HAVING COUNT(*) > 1;

Explanation:
Here, we are now checking for duplicates based on both the Name and Email columns. If a combination of both these fields is duplicated, SQL will show you the results. This is useful when you need to ensure that both fields are unique together (for example, no two people can have the same name and email combination).

Output Example:

Name | Email | count ----------|----------------------|------- John Doe | john.doe@example.com | 3

In this case, we see that the combination of John Doe and john.doe@example.com appears 3 times in the table.

How to Remove Duplicates

Once you’ve identified the duplicates, the next step is often to remove them. You don’t want to leave the table cluttered with unnecessary data. You can delete duplicates while keeping one unique record by using a Common Table Expression (CTE) or ROW_NUMBER() function.

Here’s a sample query that removes duplicates:

WITH CTE AS ( SELECT Name, Email, ROW_NUMBER() OVER (PARTITION BY Name, Email ORDER BY Customer_ID) AS row_num FROM customers ) DELETE FROM customers WHERE Customer_ID IN ( SELECT Customer_ID FROM CTE WHERE row_num > 1 );

Explanation:

ROW_NUMBER() OVER (PARTITION BY Name, Email ORDER BY Customer_ID): This assigns a unique number to each row within the group of duplicates. It starts counting from 1 for each group.
The DELETE statement then removes the rows where the row_num is greater than 1 (i.e., the duplicates).

Why Duplicates Are a Problem:

Working with clean data is essential for any data-related tasks. If you leave duplicates in your table, they can:

Skew reports and data analysis.
Slow down queries and cause performance issues.
Lead to inaccurate conclusions in your decision-making.

By identifying and removing duplicates, you’re ensuring that your data is clean, consistent, and ready for use. It saves time, increases the performance of your database, and ensures that your analysis is accurate and reliable.

Conclusion

Finding and removing duplicates in SQL is one of the most essential skills for database management. Whether you’re a developer or a data analyst, knowing how to clean up your tables will make your database more efficient, your reports more reliable, and your queries faster. The simple yet powerful GROUP BY and HAVING clauses are your best tools for identifying duplicates, while the ROW_NUMBER() function helps you remove them with ease.

By mastering this SQL technique, you’ll make sure that your database remains accurate, efficient, and ready for complex analysis, without worrying about the clutter of duplicate data.

Dreaming of a Data Analytics Career? Start with Data Analytics Certificate with Jobaaj Learnings.

SQL database management SQL query data cleaning SQL tips SQL tutorial database performance

Author

Kashish Agrawal

What does GROUP BY do in an SQL query?

GROUP BY is used to group rows that have the same values in specified columns. It’s commonly used to aggregate data and identify duplicates.

How do I find duplicates based on multiple columns?

You can find duplicates by grouping by multiple columns, such as Name and Email, to identify rows where the combination of those columns is duplicated.

What is the purpose of the HAVING clause?

The HAVING clause is used to filter groups that have been formed by the GROUP BY clause. It’s ideal for filtering out duplicates by checking the count of each group.

How do I remove duplicates in SQL?

To remove duplicates, you can use the ROW_NUMBER() function to assign a unique number to each row in a group, then delete rows where the number is greater than 1.

Can I remove duplicates from a table and keep one row?

Yes, you can use a query with a CTE and ROW_NUMBER() to remove duplicate rows, keeping only one instance of each unique record.

Why is it important to remove duplicates in SQL?

Removing duplicates ensures that your data is accurate, improves database performance, and helps avoid skewed analysis or reports.

Top 15 Consulting Firms in India 20...

Explore the top consulting firms in India including McKinsey, BCG, Bain, Deloitte, and others. Learn about roles, salaries, hiring process, ...

03 Jul 2026

5 min read

Best Projects for Investment Bankin...

Explore the best projects for investment banking students including valuation models, DCF analysis, M&A case studies, equity research projec...

03 Jul 2026

5 min read

Product Case Study Examples for Int...

Learn product case study examples for interviews with structured frameworks, real PM scenarios, and step-by-step thinking approaches used in...

5 Days IB Bootcamp

Digital Marketing

Stock Market/Trading

IT/Software

Data

Soft Skills

Finance

Artificial Intelligence

Product Management

Programs

Workshops

Book

Programs

Workshops

Crash Courses

Crash Courses

Programs

Workshops

Crash Courses

Programs

Workshops

Crash Courses

Book

Crash Courses

Book

Programs

Workshops

Crash Courses

Programs

Crash Courses

Digital Marketing

Stock Market/Trading

Data

Finance

Artificial Intelligence

Workshops Free Hands-on experience

Program Full career roadmap

Books Traditional Learning

Crash Courses Fast Learning

Digital Marketing

Stock Market/Trading

Data

Finance

Artificial Intelligence

Management Consulting

Programs

Workshops

Book

Product Management

Programs

Workshops

Crash Courses

Digital Marketing

Crash Courses

Data

Programs

Workshops

Crash Courses

Finance

Programs

Workshops

Crash Courses

Book

Stock Market/Trading

Crash Courses

Book

IT/Software

Programs

Workshops

Crash Courses

Artificial Intelligence (AI)

Programs

Crash Courses

All Courses

How would you write a query to find duplicates in a table?

What Does It Mean to Have Duplicates in a Table?

Why Is Finding Duplicates So Important?

SQL Query to Find Duplicates

Finding Duplicates Across Multiple Columns

How to Remove Duplicates

Our team will connect
with you soon.