skip to Main Content

I have a t1 table with duplicates. I would like to identify these duplicates by putting a flag on them.

  • I would like labeled all the lines whose idcommande is unique as "keep"
  • For the same idCommand, we set the one with status = ‘in progress’ AND typecom = ‘canceled’ to "keep"
  • For the same idcommand, we set to keep those which have an idRef, if idRef is not null.

The flag can be of any type True or False or other

CREATE TABLE t1 (
    id SERIAL PRIMARY KEY, 
    idcommande VARCHAR(50),
    date_ DATE,
    typecom VARCHAR(50),
    statut VARCHAR(20),
    idref INT
);

INSERT INTO t1 (idcommande, date_, statut, typecom, idref) VALUES
('1001', '2024-09-15', 'in progress', 'in progress', 1),  -- don't want
('1001', '2024-09-15', 'in progress', 'canceled', 2),     -- keep
('1002', '2024-09-16', 'completed', 'sent', 3),           -- keep
('1003', '2024-09-16', 'completed', 'sent', NULL),         -- don't want           
('1003', '2024-09-17', 'completed', 'sent', 4);           -- keep
('1004', '2024-09-15', 'completed', 'canceled', 5),     -- keep

I tried to do this but I don’t really understand how to use partition. The lines with rownum at 1 are kept but the line with idref at null is also kept even though I don’t want it

WITH RankedResults AS (
   select *,
       ROW_NUMBER() OVER (
           PARTITION BY t1.idcommande 
           ORDER BY 
               CASE 
                   WHEN t1.statut = 'in progress' and t1.typeCom != 'canceled' THEN 1 
                   WHEN t1.idref IS NULL THEN 1
                   ELSE 0 
               END,
               t1.date_, 
               t1.typecom
       ) AS RowNum
   FROM (
       SELECT *
       FROM t1
    ) as t1
)
 
select * from (SELECT * FROM RankedResults) as tmain

Résult:
enter image description here

2

Answers


  1. Salut @dia05!

    • Check your conditions on the CASE statement:
      • t1.typecom = 'canceled' (instead of != 'canceled')
      • idref IS NOT NULL to prioritize rows where idref is present (instead of idref IS NULL).
      • ELSE 3
    WITH RankedResults AS (
      SELECT
        t1.*,
        ROW_NUMBER() OVER ( PARTITION BY t1.idcommande ORDER BY
          CASE
            WHEN t1.statut = 'in progress' AND t1.typecom = 'canceled' THEN 1
            WHEN t1.idref IS NOT NULL THEN 2
            ELSE 3
          END
        ) AS rownum
      FROM t1
    )
    SELECT
      id, idcommande, date_, typecom, statut, idref, RowNum,
      CASE
        WHEN cnt = 1 THEN 'keep'
        WHEN RowNum = 1 THEN 'keep'
        ELSE 'do not keep'
      END AS flag
    FROM RankedResults
    ORDER BY idcommande, RowNum
    

    Output:

    id idcommande date_ typecom statut idref rownum flag
    2 1001 2024-09-15 canceled in progress 2 1 keep
    1 1001 2024-09-15 in progress in progress 1 2 do not keep
    3 1002 2024-09-16 sent completed 3 1 keep
    4 1002 2024-09-16 sent completed NULL 2 do not keeo
    5 1003 2024-09-17 sent completed 4 1 keep
    Login or Signup to reply.
  2. hope this would help
    I would suggest checking the case statement the typeCom should by =’cancel’ instead of != :

    WHEN t1.statut = ‘in progress’ and t1.typeCom != ‘canceled’THEN 1
    WHEN t1.idref IS NULL THEN 1 ELSE 0

    I think this query should help you:
    I separated each criteria on its own condition then prioritize the data on the order by statement

    SELECT idcommande, date_, statut, typecom, idref,case when RN=1 then 'keep' else 'remove'END Keep
     FROM (
    SELECT *
    ,Row_number() Over (Partition by idcommande
      order by idcommande,iif(statut='in progress',1,2),iif(typecom='canceled',1,2),iif(idref is not null,1,2)) RN 
    FROM t1
    )A

    Output:

    enter image description here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search