skip to Main Content

In the following data:

id_1 id_2 colA colB colC
1 2 2022-01-02 scroll fast 12
1 2 2022-01-02 scroll fast 12
1 3 2022-01-03 scroll fast fast 11
1 3 2022-01-03 scroll fast fast 11
1 3 2022-01-03 scroll fast fast 11

I would like to remove duplicates only when they occur in pair. The output table should have one of the rows from the first two rows and remaining rows should be left as is.

The output table would be

id_1 id_2 colA colB colC
1 2 2022-01-02 scroll fast 12
1 3 2022-01-03 scroll fast fast 11
1 3 2022-01-03 scroll fast fast 11
1 3 2022-01-03 scroll fast fast 11

I have large dataset, so I cannot filter based on Id_1 and id_2. I am looking for a generic solution.

Test data is here at Sqlfiddle.

2

Answers


  1. We can use a combination of COUNT() and ROW_NUMBER(), both as analytic functions:

    WITH cte AS (
        SELECT t.*, COUNT(*) OVER (PARTITION BY id_1, id_2) cnt,
                    ROW_NUMBER() OVER (PARTITION BY id_1, id_2 ORDER BY id_1) rn
        FROM yourTable t
    )
    
    SELECT id_1, id_2, colA, colB, colC
    FROM cte
    WHERE cnt <> 2 OR rn = 1;
    

    The above logic returns rows appearing only once or appearing 3 times or more. In the case of pair duplicates, it arbitrarily returns one of them.

    Login or Signup to reply.
  2. You can use a MERGE statement and count the size of each group using analytic functions and then delete the second row of a group when there are only 2 rows per group:

    MERGE INTO table_name dst
    USING (
      SELECT ROWID AS rid,
             ROW_NUMBER() OVER (PARTITION BY id_1, id_2, colA, colB, colC ORDER BY ROWNUM)
               AS rn,
             COUNT(*) OVER (PARTITION BY id_1, id_2, colA, colB, colC) AS cnt
      FROM   table_name
    ) src
    ON (dst.ROWID = src.rid AND src.rn = 2 AND src.cnt = 2)
    WHEN MATCHED THEN
      UPDATE SET colC = colC
      DELETE WHERE 1 = 1;
    

    Which, for the sample data:

    create table table_name (id_1, id_2, colA, colB, colC) AS
    SELECT 1, 2, 'A', 'B', 'C' FROM DUAL UNION ALL
    SELECT 1, 2, 'A', 'B', 'C' FROM DUAL UNION ALL
    SELECT 1, 3, 'D', 'E', 'F' FROM DUAL UNION ALL
    SELECT 1, 3, 'D', 'E', 'F' FROM DUAL UNION ALL
    SELECT 1, 3, 'D', 'E', 'F' FROM DUAL;
    

    Then after the MERGE the table contains:

    ID_1 ID_2 COLA COLB COLC
    1 2 A B C
    1 3 D E F
    1 3 D E F
    1 3 D E F

    fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search