skip to Main Content

There are numerous SQL questions and answers about finding gaps in sequences, but I could not find any that have the wrinkle my situation has: I need to group the sequence by another column.

My table (simplified)

id | parent_id | list_index
1  | 987       | 0
2  | 987       | 1
3  | 987       | 2

4  | 654       | 0
5  | 654       | 2

7  | 321       | 1
8  | 321       | 2
9  | 321       | 4
10 | 321       | 5

I added the empty lines to visualize the grouping.

The business rule is that for any parent_id there should be a set of rows with list_index values in a contiguous sequence starting at 0. However, the data has gaps and it’s those gaps I need to find. Specifically, I need to identify the parent_ids that have a gap.

In the sample data above, I would need the query to identify parent ids 654 and 321 because 654 is missing a row with list_index 1 and parent 321 is missing rows with list_index 0 and 3. Parent id 987 should not be included because it’s rows have a contiguous sequence.

I do not care what the gap looks like (eg, where it starts or ends), just which parentids are missing 1 or more rows.

3

Answers


  1. Gaps in your case can be considered separately for each group (parent_id).
    Then the task becomes a regular gap-finding task again.

    See examples.

    1. Full details about gap
    select *
    from(
      select *
        ,case when prev_list_index is null and list_index<>0 then 'start not 0'
            when (prev_list_index+1)<>list_index then concat('missing ',(list_index-1))
         else null
        end isGap
      from(
        select * 
          ,lag(list_index)over(partition by parent_id order by list_index) prev_list_index
        from test
      )a
    )b
    where isGap is not null
    

    Output is

    id parent_id list_index prev_list_index isGap
    7 321 1 null start not 0
    9 321 4 2 missing 3
    5 654 2 0 missing 1
    1. Or take 1 row for parent_id
    select parent_id,min(id) id,group_concat(isGap order by isGap) gaps
    from(
      select *
        ,case when prev_list_index is null and list_index<>0 then 0
            when (prev_list_index+1)<>list_index then (list_index-1)
         else null
        end isGap
      from(
        select * 
          ,lag(list_index)over(partition by parent_id order by list_index) prev_list_index
        from test
      )a
    )b
    where isGap is not null
    group by parent_id
    
    parent_id id gaps
    321 7 0,3
    654 5 1
    1. Query sample without any details
    select parent_id
    from test
    group by parent_id
    having count(*)<>(max(list_index)+1)
    
    parent_id
    654
    321
    Login or Signup to reply.
  2. We can use math and partitioning to solve this.

    1. We will need a table expression to use some window functions: "indexed_data"
    2. Track a row number and partition by parent_id
    3. Track the lowest list_index for a parent_id (we will use this as a counting base to detect sequentiality from this arbitrary starting point)
    4. New table expression "gap_check" that looks for all parent ids where the actual list_index does not equal the expected list index (should the numbers be sequential) – hence indicating that the row is not sequential
    5. Distinctly display all parent ids that passed the gap_check and therefor have gaps
    WITH indexed_data AS (SELECT id,
                                 parent_id,
                                 list_index,
                                 ROW_NUMBER() OVER (PARTITION BY parent_id ORDER BY list_index) AS rn,
                                 MIN(list_index) OVER (PARTITION BY parent_id)                  AS min_list_index
                          FROM your_table),
         gap_check AS (SELECT parent_id FROM indexed_data WHERE list_index <> (min_list_index + rn - 1))
    SELECT DISTINCT parent_id
    FROM gap_check;
    
    Login or Signup to reply.
  3. DBFiddleUk Demo

    Premise: use a window function Row Number() and order by your list index and identify records where the row_number assigned-1 isn’t equal to the list index.

    CTE- your data
    CTE2- your data with a row_numer assigned
    Final query against CTE2 gives you the group and a count of records which don’t match.

    With CTE as(
    Select 1 ID, 987 parent_Id, 0 list_index UNION ALL
    Select 2 ID, 987 parent_Id, 1 list_index UNION ALL
    Select 3 ID, 987 parent_Id, 2 list_index UNION ALL
    Select 4 ID, 654 parent_Id, 0 list_index UNION ALL
    Select 5 ID, 654 parent_Id, 2 list_index UNION ALL
    Select 7 ID, 321 parent_Id, 1 list_index UNION ALL
    Select 8 ID, 321 parent_Id, 2 list_index UNION ALL
    Select 9 ID, 321 parent_Id, 4 list_index UNION ALL
    Select 10 ID, 321 parent_Id, 5 list_index),
    CTE2 as (
    SELECT A.*, row_number() over (partition by parent_ID order by list_index)-1 RN
      FROM CTE A)
    
    SELECT Parent_ID, count(*) from CTE2 where Rn <> List_index
    GROUP BY Parent_ID
    

    Giving us:

    +-----------+----------+
    | Parent_ID | count(*) |
    +-----------+----------+
    |       321 |        4 |
    |       654 |        1 |
    +-----------+----------+
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search