skip to Main Content

I have a table with records in it. The desire was to find the records that were exactly the same, except for the time of the 2 records have a difference of less than 2 seconds.

I created a query that works, but as a join, I’d like to switch it to a group by so I can have the records listed vertically, rather than having to go across the entire row to see both records.

I found a solution using a join, but I really would like to see what a solution could be with a group by. Or any solution that shows both records vertically.

This is the query I have that works

SELECT p1.*, p2.*
FROM punches_history p1
INNER JOIN punches_history p2 ON 
    p1.ClockNumber = p2.ClockNumber AND  
    p1.PartNumber = p2.PartNumber  AND  
    p1.Quantity = p2.Quantity  AND 
    p1.YYMMDD = p2.YYMMDD AND  
    ABS(p1.TIMED - p2.TIMED) < 2  
WHERE p1.ClockNumber != '10' AND p1.id = (p2.id + 1) AND p1.CardColor = 'BLU'
ORDER BY p1.YYMMDD desc, p1.ClockNumber desc

The WHERE stipulation that needs explaining is p1.id = (p2.id + 1) the reason for this logic is so if records 20 and 21 are found records 21, and 20 aren’t also found.

2

Answers


  1. Here is a way I think you can pivot this vertically. Take the id’s from the two columns and join back on the table using an OR.

    SELECT ph.*
        FROM (
            SELECT p1.id AS id, p2.id AS id2 
            FROM punches_history p1
                 INNER JOIN punches_history p2 ON
                        p1.ClockNumber = p2.ClockNumber AND
                        p1.PartNumber = p2.PartNumber  AND
                        p1.Quantity = p2.Quantity  AND
                        p1.YYMMDD = p2.YYMMDD AND
                        ABS(p1.TIMED - p2.TIMED) < 2
            WHERE p1.ClockNumber != '10' AND p1.id > p2.id AND p1.CardColor = 'BLU'
            ORDER BY p1.YYMMDD DESC, p1.ClockNumber DESC
        ) matches
    JOIN punches_history ph ON ph.id = matches.id OR ph.id = matches.id2
    ORDER BY ph.YYMMDD DESC, ph.ClockNumber DESC, ph.id
    

    You might continue from there to group in some way if you wanted to.

    Login or Signup to reply.
  2. As I read your question, you want to identify consecutive rows having a set of similar columns with a lesser than 2 seconds gaps – which suggest a gaps-and-island approach.

    The idea is to compare the values of the related columns across rows using lag() to identify islands.

    Here is a query that works for "duplicates" in columns ClockNumber and PartNumber:

    select ClockNumber, PartNumber, 
        min(id) from_id, max(id) to_id,
        min(timed) from_timed, max(timed) to_timed,
        count(*) cnt_records_in_group
    from (
        select p.*,
            sum(case 
                when LagClockNumber = ClockNumber and LagPartNumber = PartNumbee and Timed - LagTimed < 2
                then 0 else 1
            end) over(order by id) grp               
        from (
            select p.*,
                lag(Timed)       over(order by id) LagTimed,
                lag(ClockNumber) over(order by id) LagClockNumber,
                lag(PartNumber)  over(order by id) LagPartNumber
            from punches_history p
            where ClockNumber != 10 and CardColor = 'BLU'
        ) p
    ) p
    group by ClockNumber, PartNumber, grp
    having count(*) > 1
    

    This brings one row for each group of (consecutive) duplicate rows, along with starting and ending id and timed, and count of duplicates per group.

    You would typically extend the query with more lag()s and more conditions in thé case expression to add more columns in the duplicate detection logic (Quantity
    and YYMMDD).

    Note: presumably Timed is always increasing, so abs() is not needed.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search