skip to Main Content

This is my sample record set.
I want to SELECT (DELETE after confirmation) all the consecutive duplicate records having the same lid, same sid, same stype, same bid and 0 or null noOfDays. The lastest record should be preserved and all earlier ones should be removed.

rid lid sid stype uid urole bid rdate ndays
1 123 106963 P 0 M 5243 2024-06-24 00:00:00 1
2 123 106977 P 0 M 5243 2024-06-24 00:00:00 1
3 123 114234 P 0 M 5243 2024-06-24 00:00:00 2
4 123 106973 P 0 M 5243 2024-06-25 00:00:00 9
5 123 106975 P 0 M 5243 2024-07-03 00:00:00 1
6 123 106964 P 0 M 5243 2024-07-03 00:00:00 1
7 123 107394 P 0 M 5243 2024-07-03 00:00:00 1
8 123 106964 P 0 M 5243 2024-07-03 00:00:00 3
9 123 107394 P 0 M 5243 2024-07-05 00:00:00 4
10 123 107395 P 0 M 5243 2024-07-08 00:00:00 25
11 123 106964 P 0 M 5243 2024-08-01 00:00:00 1
12 123 107394 P 0 M 5243 2024-08-01 00:00:00 2
13 123 107395 P 0 M 5243 2024-08-02 00:00:00 0
14 123 107395 P 0 M 5243 2024-08-02 00:00:00 0
15 123 107395 P 0 M 5243 2024-08-02 00:00:00 7
16 123 106971 P 0 M 5243 2024-08-08 00:00:00 5
17 123 106974 P 0 M 5243 2024-08-12 00:00:00 1
18 123 106976 P 0 M 5243 2024-08-12 00:00:00 3
19 123 106980 P 0 M 5243 2024-08-14 00:00:00 0

Expected Output for the give dataset:

rid lid sid stype uid urole bid rdate ndays
13 123 107395 P 0 M 5243 2024-08-02 00:00:00 0
14 123 107395 P 0 M 5243 2024-08-02 00:00:00 0

Explanation:
Rows with rid 13, 14 and 15 have the same lid, same sid, same stype, same bid and 0 noOfDays. Also they are coming one after another in sequence so I want to keep the latest (rid 15) and remove the other two (rid 13 and 14).

2

Answers


  1. Looks like you need in the next query:

    SELECT t1.*
    FROM table t1
    JOIN table t2 USING (lid, sid, stype, uid, urole, bid, rdate)
    WHERE t1.ndays = 0
      AND t1.rid = t2.rid - 1
    
    Login or Signup to reply.
  2. Select Consecutive Duplicate Records

    The following query will help you select consecutive duplicate records, excluding the latest one (highest rid).

    SELECT *
    FROM your_table t1
    JOIN (
        SELECT MAX(rid) AS latest_rid, lid, sid, stype, bid
        FROM your_table
        WHERE (ndays = 0 OR ndays IS NULL)
        GROUP BY lid, sid, stype, bid
        HAVING COUNT(*) > 1
    ) AS t2 ON t1.lid = t2.lid
          AND t1.sid = t2.sid
          AND t1.stype = t2.stype
          AND t1.bid = t2.bid
          AND t1.rid < t2.latest_rid
    ORDER BY t1.lid, t1.sid, t1.rdate;
    

    Confirm and Delete the Selected Records

    Once you’ve confirmed the records to delete, you can replace the SELECT * statement with DELETE to remove them:

    DELETE t1
    FROM your_table t1
    JOIN (
        SELECT MAX(rid) AS latest_rid, lid, sid, stype, bid
        FROM your_table
        WHERE (ndays = 0 OR ndays IS NULL)
        GROUP BY lid, sid, stype, bid
        HAVING COUNT(*) > 1
    ) AS t2 ON t1.lid = t2.lid
          AND t1.sid = t2.sid
          AND t1.stype = t2.stype
          AND t1.bid = t2.bid
          AND t1.rid < t2.latest_rid;
    

    This DELETE command will remove all consecutive duplicates while keeping only the latest one in each sequence. Ensure to back up your data before running this to avoid accidental data loss.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search