You are currently viewing How to Remove Duplicate Records from a Snowflake Table

How to Remove Duplicate Records from a Snowflake Table

There are multiple ways to remove the duplicates but in this post, we will focus on one particularly efficient technique to remove duplicates and provide a detailed explanation to help you understand and apply it effectively in your work.

 

1.Set up the sample table and data.

To illustrate the process of removing duplicates, we will use a sample customer table with the following schema:

				
					CREATE TABLE customer (
    id INT AUTOINCREMENT PRIMARY KEY,
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    email VARCHAR(100)
);
				
			

We will insert sample data into the customer table, including some duplicate records:

				
					INSERT INTO customer (first_name, last_name, email)
VALUES ('Suresh', 'Reddy', 'suresh.reddy@example.com'),
       ('Suresh', 'J', 'Suresh.smith@example.com'),
       ('Shiva', 'Balaji', 'bala.siva@example.com'),
       ('Suresh', 'Reddy', 'suresh.reddy@example.com'),
       ('Shiva', 'Balaji', 'bala.siva@example.com');
				
			

Step 2: Identify and remove duplicate records using SQL

Notice that there are duplicate records for Suresh Reddy and Shiva Balaji.

				
					SELECT
        id,
        ROW_NUMBER() OVER (PARTITION BY first_name, last_name, email ORDER BY id) AS row_num
    FROM

        customer
				
			

The above query works by utilizing the ROW_NUMBER() window function to assign a unique row number to each row within a partition. The partition is defined by the combination of first_namelast_name, and email columns

				
					DELETE FROM CUSTOMER WHERE ID IN (
  SELECT id FROM (
    SELECT
        id,
        ROW_NUMBER() OVER (PARTITION BY first_name, last_name, email ORDER BY id) AS row_num
    FROM
        customer
  )
  WHERE row_num > 1
);
				
			

The DELETE statement then removes all rows whose id values appear in the subquery with a row_num greater than 1.

Step 3: Verify the results

To ensure the duplicate records have been removed from the customer table, run a SELECT query:

				
					SELECT * FROM customer;
				
			

Leave a Reply