Postgresql - identified duplicates according to criteria

dia05
September 19, 2024
89 views
0 votes
2 Answers

I have a t1 table with duplicates. I would like to identify these duplicates by putting a flag on them.

I would like labeled all the lines whose idcommande is unique as "keep"
For the same idCommand, we set the one with status = ‘in progress’ AND typecom = ‘canceled’ to "keep"
For the same idcommand, we set to keep those which have an idRef, if idRef is not null.

The flag can be of any type True or False or other

CREATE TABLE t1 (
    id SERIAL PRIMARY KEY, 
    idcommande VARCHAR(50),
    date_ DATE,
    typecom VARCHAR(50),
    statut VARCHAR(20),
    idref INT
);

INSERT INTO t1 (idcommande, date_, statut, typecom, idref) VALUES
('1001', '2024-09-15', 'in progress', 'in progress', 1),  -- don't want
('1001', '2024-09-15', 'in progress', 'canceled', 2),     -- keep
('1002', '2024-09-16', 'completed', 'sent', 3),           -- keep
('1003', '2024-09-16', 'completed', 'sent', NULL),         -- don't want           
('1003', '2024-09-17', 'completed', 'sent', 4);           -- keep
('1004', '2024-09-15', 'completed', 'canceled', 5),     -- keep

I tried to do this but I don’t really understand how to use partition. The lines with rownum at 1 are kept but the line with idref at null is also kept even though I don’t want it

WITH RankedResults AS (
   select *,
       ROW_NUMBER() OVER (
           PARTITION BY t1.idcommande 
           ORDER BY 
               CASE 
                   WHEN t1.statut = 'in progress' and t1.typeCom != 'canceled' THEN 1 
                   WHEN t1.idref IS NULL THEN 1
                   ELSE 0 
               END,
               t1.date_, 
               t1.typecom
       ) AS RowNum
   FROM (
       SELECT *
       FROM t1
    ) as t1
)
 
select * from (SELECT * FROM RankedResults) as tmain

Résult:

Tags: postgresql sql

Answers

Salut @dia05!

Check your conditions on the CASE statement:
- t1.typecom = 'canceled' (instead of != 'canceled')
- idref IS NOT NULL to prioritize rows where idref is present (instead of idref IS NULL).
- ELSE 3

WITH RankedResults AS (
  SELECT
    t1.*,
    ROW_NUMBER() OVER ( PARTITION BY t1.idcommande ORDER BY
      CASE
        WHEN t1.statut = 'in progress' AND t1.typecom = 'canceled' THEN 1
        WHEN t1.idref IS NOT NULL THEN 2
        ELSE 3
      END
    ) AS rownum
  FROM t1
)
SELECT
  id, idcommande, date_, typecom, statut, idref, RowNum,
  CASE
    WHEN cnt = 1 THEN 'keep'
    WHEN RowNum = 1 THEN 'keep'
    ELSE 'do not keep'
  END AS flag
FROM RankedResults
ORDER BY idcommande, RowNum

Output:

id	idcommande	date_	typecom	statut	idref	rownum	flag
2	1001	2024-09-15	canceled	in progress	2	1	keep
1	1001	2024-09-15	in progress	in progress	1	2	do not keep
3	1002	2024-09-16	sent	completed	3	1	keep
4	1002	2024-09-16	sent	completed	NULL	2	do not keeo
5	1003	2024-09-17	sent	completed	4	1	keep

- MohammedAwad
- September 19, 2024 at 12:49 pm
- 0 votes
0
hope this would help
I would suggest checking the case statement the typeCom should by =’cancel’ instead of != :

WHEN t1.statut = ‘in progress’ and t1.typeCom != ‘canceled’THEN 1
WHEN t1.idref IS NULL THEN 1 ELSE 0

I think this query should help you:
I separated each criteria on its own condition then prioritize the data on the order by statement
SELECT idcommande, date_, statut, typecom, idref,case when RN=1 then 'keep' else 'remove'END Keep FROM ( SELECT * ,Row_number() Over (Partition by idcommande order by idcommande,iif(statut='in progress',1,2),iif(typecom='canceled',1,2),iif(idref is not null,1,2)) RN FROM t1 )A
Output:

enter image description here
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – identified duplicates according to criteria

Answers