How do I convert a JOIN on the same table into a Group By - Mysql

DaTurk
March 28, 2023
304 views
0 votes
2 Answers

I have a table with records in it. The desire was to find the records that were exactly the same, except for the time of the 2 records have a difference of less than 2 seconds.

I created a query that works, but as a join, I’d like to switch it to a group by so I can have the records listed vertically, rather than having to go across the entire row to see both records.

I found a solution using a join, but I really would like to see what a solution could be with a group by. Or any solution that shows both records vertically.

This is the query I have that works

SELECT p1.*, p2.*
FROM punches_history p1
INNER JOIN punches_history p2 ON 
    p1.ClockNumber = p2.ClockNumber AND  
    p1.PartNumber = p2.PartNumber  AND  
    p1.Quantity = p2.Quantity  AND 
    p1.YYMMDD = p2.YYMMDD AND  
    ABS(p1.TIMED - p2.TIMED) < 2  
WHERE p1.ClockNumber != '10' AND p1.id = (p2.id + 1) AND p1.CardColor = 'BLU'
ORDER BY p1.YYMMDD desc, p1.ClockNumber desc

The WHERE stipulation that needs explaining is p1.id = (p2.id + 1) the reason for this logic is so if records 20 and 21 are found records 21, and 20 aren’t also found.

Answers

Here is a way I think you can pivot this vertically. Take the id’s from the two columns and join back on the table using an OR.

SELECT ph.*
    FROM (
        SELECT p1.id AS id, p2.id AS id2 
        FROM punches_history p1
             INNER JOIN punches_history p2 ON
                    p1.ClockNumber = p2.ClockNumber AND
                    p1.PartNumber = p2.PartNumber  AND
                    p1.Quantity = p2.Quantity  AND
                    p1.YYMMDD = p2.YYMMDD AND
                    ABS(p1.TIMED - p2.TIMED) < 2
        WHERE p1.ClockNumber != '10' AND p1.id > p2.id AND p1.CardColor = 'BLU'
        ORDER BY p1.YYMMDD DESC, p1.ClockNumber DESC
    ) matches
JOIN punches_history ph ON ph.id = matches.id OR ph.id = matches.id2
ORDER BY ph.YYMMDD DESC, ph.ClockNumber DESC, ph.id

You might continue from there to group in some way if you wanted to.

- GMB
- March 28, 2023 at 11:25 pm
- 0 votes
0
As I read your question, you want to identify consecutive rows having a set of similar columns with a lesser than 2 seconds gaps – which suggest a gaps-and-island approach.

The idea is to compare the values of the related columns across rows using lag() to identify islands.

Here is a query that works for "duplicates" in columns ClockNumber and PartNumber:
```
select ClockNumber, PartNumber, 
    min(id) from_id, max(id) to_id,
    min(timed) from_timed, max(timed) to_timed,
    count(*) cnt_records_in_group
from (
    select p.*,
        sum(case 
            when LagClockNumber = ClockNumber and LagPartNumber = PartNumbee and Timed - LagTimed < 2
            then 0 else 1
        end) over(order by id) grp               
    from (
        select p.*,
            lag(Timed)       over(order by id) LagTimed,
            lag(ClockNumber) over(order by id) LagClockNumber,
            lag(PartNumber)  over(order by id) LagPartNumber
        from punches_history p
        where ClockNumber != 10 and CardColor = 'BLU'
    ) p
) p
group by ClockNumber, PartNumber, grp
having count(*) > 1
```
This brings one row for each group of (consecutive) duplicate rows, along with starting and ending id and timed, and count of duplicates per group.

You would typically extend the query with more lag()s and more conditions in thé case expression to add more columns in the duplicate detection logic (Quantity
and YYMMDD).

Note: presumably Timed is always increasing, so abs() is not needed.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

How do I convert a JOIN on the same table into a Group By – Mysql

Answers