skip to Main Content

I have two relevant tables in MySQL that look something like this:

marker table

id SEASONCD ITEMCD PRICETYPECD
1 foo bar baz
2 foo bar baz
3 foo bar baz
4 qux bar baz
5 qux bar baz
6 spam eggs ham

seat_marker table

id MARKER_ID
1 1
2 1
3 1
4 5
5 6

I would like to select all of the rows in the marker table that are duplicates, excluding the one that is most frequently referenced / joined by the seat_marker table. In other words, I would like to list all duplicates excluding the "original", where "original" is defined as the row with the most occurrences in seat_marker.MARKER_ID.

The result would look like this:

id SEASONCD ITEMCD PRICETYPECD
2 foo bar baz
3 foo bar baz
5 qux bar baz

We can see this is the original data from the marker table, but with three rows removed;

  • id 1 was removed because it was the foo-bar-baz duplicate that occurred more frequently in seat_marker than the other foo-bar-baz rows.
  • id 4 was removed because it was a duplicate. Since the qux-bar-baz rows show up 0 times in seat_marker, I don’t have a preference if we remove id 4 or id 5. I chose id 4 at random.
  • id 6 was removed because it is not a duplicate

The closest I’ve been able to get is this, which lists all values that are a duplicate and the number of assignments it has. But does not exclude the "originals".

-- Select all duplicates, and list the number of times it's referenced in the seat_marker table
SELECT 
  n1.id, n1.SEASONCD, n1.ITEMCD, n1.PRICETYPECD, (
    SELECT COUNT(1) 
    FROM seat_marker 
    WHERE MARKER_ID = n1.id
  ) active_assignments
FROM marker n1, marker n2 
WHERE
    n1.id <> n2.id AND 
    n1.SEASONCD = n2.SEASONCD AND 
    n1.ITEMCD = n2.ITEMCD AND
    n1.PRICETYPECD = n2.PRICETYPECD
GROUP BY id, SEASONCD, ITEMCD, PRICETYPECD, active_assignments
ORDER BY SEASONCD, ITEMCD, PRICETYPECD, active_assignments DESC, id;

2

Answers


  1. Based on the modifications you made in the question, you can use the following code:

        Select * From marker 
        inner join (Select CONCAT_WS(',',SEASONCD,ITEMCD,PRICETYPECD) as xx 
            From marker Group by CONCAT_WS(',',SEASONCD,ITEMCD,PRICETYPECD) having 
            Count(xx)>1) as NewDT
        on CONCAT_WS(',',SEASONCD,ITEMCD,PRICETYPECD) = NewDT.xx
        Where  id <>(Select Marker_ID From seat_marker Group by Marker_ID order by 
            Count(*) desc limit 1)
        Order by id
    
    Login or Signup to reply.
  2. The expected result included in your question appears to be incorrect. It should be displaying 4 instead of 5, as 5 has one corresponding row in seat_marker and is therefor the "row with the most occurrences".

    One way to do this is with COUNT and ROW_NUMBER():

    WITH cte AS (
        SELECT m.*,
            ROW_NUMBER() OVER (PARTITION BY SEASONCD, ITEMCD, PRICETYPECD ORDER BY COUNT(sm.id) DESC, m.id) AS rn
        FROM marker m
        LEFT JOIN seat_marker sm ON m.id = sm.MARKER_ID
        GROUP BY m.id
    )
    SELECT * FROM cte WHERE rn > 1;
    

    Output:

    id SEASONCD ITEMCD PRICETYPECD rn
    2 foo bar baz 2
    3 foo bar baz 3
    4 qux bar baz 2

    Here’s a db<>fiddle.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search