Mysql - SELECT duplicate values excluding most frequently used

uPaymeiFixit
May 7, 2024
141 views
0 votes
2 Answers

I have two relevant tables in MySQL that look something like this:

marker table

id	SEASONCD	ITEMCD	PRICETYPECD
1	foo	bar	baz
2	foo	bar	baz
3	foo	bar	baz
4	qux	bar	baz
5	qux	bar	baz
6	spam	eggs	ham

seat_marker table

id	MARKER_ID
1	1
2	1
3	1
4	5
5	6

I would like to select all of the rows in the marker table that are duplicates, excluding the one that is most frequently referenced / joined by the seat_marker table. In other words, I would like to list all duplicates excluding the "original", where "original" is defined as the row with the most occurrences in seat_marker.MARKER_ID.

The result would look like this:

id	SEASONCD	ITEMCD	PRICETYPECD
2	foo	bar	baz
3	foo	bar	baz
5	qux	bar	baz

We can see this is the original data from the marker table, but with three rows removed;

id 1 was removed because it was the foo-bar-baz duplicate that occurred more frequently in seat_marker than the other foo-bar-baz rows.
id 4 was removed because it was a duplicate. Since the qux-bar-baz rows show up 0 times in seat_marker, I don’t have a preference if we remove id 4 or id 5. I chose id 4 at random.
id 6 was removed because it is not a duplicate

The closest I’ve been able to get is this, which lists all values that are a duplicate and the number of assignments it has. But does not exclude the "originals".

-- Select all duplicates, and list the number of times it's referenced in the seat_marker table
SELECT 
  n1.id, n1.SEASONCD, n1.ITEMCD, n1.PRICETYPECD, (
    SELECT COUNT(1) 
    FROM seat_marker 
    WHERE MARKER_ID = n1.id
  ) active_assignments
FROM marker n1, marker n2 
WHERE
    n1.id <> n2.id AND 
    n1.SEASONCD = n2.SEASONCD AND 
    n1.ITEMCD = n2.ITEMCD AND
    n1.PRICETYPECD = n2.PRICETYPECD
GROUP BY id, SEASONCD, ITEMCD, PRICETYPECD, active_assignments
ORDER BY SEASONCD, ITEMCD, PRICETYPECD, active_assignments DESC, id;

Tags: mysql sql

Answers

Based on the modifications you made in the question, you can use the following code:

    Select * From marker 
    inner join (Select CONCAT_WS(',',SEASONCD,ITEMCD,PRICETYPECD) as xx 
        From marker Group by CONCAT_WS(',',SEASONCD,ITEMCD,PRICETYPECD) having 
        Count(xx)>1) as NewDT
    on CONCAT_WS(',',SEASONCD,ITEMCD,PRICETYPECD) = NewDT.xx
    Where  id <>(Select Marker_ID From seat_marker Group by Marker_ID order by 
        Count(*) desc limit 1)
    Order by id

- user1191247
- May 7, 2024 at 3:59 pm
- 0 votes
0
The expected result included in your question appears to be incorrect. It should be displaying 4 instead of 5, as 5 has one corresponding row in seat_marker and is therefor the "row with the most occurrences".

One way to do this is with COUNT and ROW_NUMBER():
```
WITH cte AS (
    SELECT m.*,
        ROW_NUMBER() OVER (PARTITION BY SEASONCD, ITEMCD, PRICETYPECD ORDER BY COUNT(sm.id) DESC, m.id) AS rn
    FROM marker m
    LEFT JOIN seat_marker sm ON m.id = sm.MARKER_ID
    GROUP BY m.id
)
SELECT * FROM cte WHERE rn > 1;
```
Output:

id SEASONCD ITEMCD PRICETYPECD rn

2 foo bar baz 2

3 foo bar baz 3

4 qux bar baz 2

Here’s a db<>fiddle.
Login or Signup to reply.

id	SEASONCD	ITEMCD	PRICETYPECD	rn
2	foo	bar	baz	2
3	foo	bar	baz	3
4	qux	bar	baz	2

Please signup or login to give your own answer.

Click here to cancel reply.

Mysql – SELECT duplicate values excluding most frequently used

Answers