How to find gaps in an int column (grouped by another column) in MySQL?

eRIZ
May 31, 2024
199 views
3 votes
3 Answers

There are numerous SQL questions and answers about finding gaps in sequences, but I could not find any that have the wrinkle my situation has: I need to group the sequence by another column.

My table (simplified)

id | parent_id | list_index
1  | 987       | 0
2  | 987       | 1
3  | 987       | 2

4  | 654       | 0
5  | 654       | 2

7  | 321       | 1
8  | 321       | 2
9  | 321       | 4
10 | 321       | 5

I added the empty lines to visualize the grouping.

The business rule is that for any parent_id there should be a set of rows with list_index values in a contiguous sequence starting at 0. However, the data has gaps and it’s those gaps I need to find. Specifically, I need to identify the parent_ids that have a gap.

In the sample data above, I would need the query to identify parent ids 654 and 321 because 654 is missing a row with list_index 1 and parent 321 is missing rows with list_index 0 and 3. Parent id 987 should not be included because it’s rows have a contiguous sequence.

I do not care what the gap looks like (eg, where it starts or ends), just which parentids are missing 1 or more rows.

Answers

Gaps in your case can be considered separately for each group (parent_id).
Then the task becomes a regular gap-finding task again.

See examples.

Full details about gap

select *
from(
  select *
    ,case when prev_list_index is null and list_index<>0 then 'start not 0'
        when (prev_list_index+1)<>list_index then concat('missing ',(list_index-1))
     else null
    end isGap
  from(
    select * 
      ,lag(list_index)over(partition by parent_id order by list_index) prev_list_index
    from test
  )a
)b
where isGap is not null

Output is

id	parent_id	list_index	prev_list_index	isGap
7	321	1	null	start not 0
9	321	4	2	missing 3
5	654	2	0	missing 1

Or take 1 row for parent_id

select parent_id,min(id) id,group_concat(isGap order by isGap) gaps
from(
  select *
    ,case when prev_list_index is null and list_index<>0 then 0
        when (prev_list_index+1)<>list_index then (list_index-1)
     else null
    end isGap
  from(
    select * 
      ,lag(list_index)over(partition by parent_id order by list_index) prev_list_index
    from test
  )a
)b
where isGap is not null
group by parent_id

parent_id	id	gaps
321	7	0,3
654	5	1

Query sample without any details

select parent_id
from test
group by parent_id
having count(*)<>(max(list_index)+1)

parent_id
654
321

- ColeStansbury
- May 31, 2024 at 7:57 pm
- 0 votes
0
We can use math and partitioning to solve this.
1. We will need a table expression to use some window functions: "indexed_data"
2. Track a row number and partition by parent_id
3. Track the lowest list_index for a parent_id (we will use this as a counting base to detect sequentiality from this arbitrary starting point)
4. New table expression "gap_check" that looks for all parent ids where the actual list_index does not equal the expected list index (should the numbers be sequential) – hence indicating that the row is not sequential
5. Distinctly display all parent ids that passed the gap_check and therefor have gaps
```
WITH indexed_data AS (SELECT id,
                             parent_id,
                             list_index,
                             ROW_NUMBER() OVER (PARTITION BY parent_id ORDER BY list_index) AS rn,
                             MIN(list_index) OVER (PARTITION BY parent_id)                  AS min_list_index
                      FROM your_table),
     gap_check AS (SELECT parent_id FROM indexed_data WHERE list_index <> (min_list_index + rn - 1))
SELECT DISTINCT parent_id
FROM gap_check;
```
Login or Signup to reply.

DBFiddleUk Demo

Premise: use a window function Row Number() and order by your list index and identify records where the row_number assigned-1 isn’t equal to the list index.

CTE- your data
CTE2- your data with a row_numer assigned
Final query against CTE2 gives you the group and a count of records which don’t match.

With CTE as(
Select 1 ID, 987 parent_Id, 0 list_index UNION ALL
Select 2 ID, 987 parent_Id, 1 list_index UNION ALL
Select 3 ID, 987 parent_Id, 2 list_index UNION ALL
Select 4 ID, 654 parent_Id, 0 list_index UNION ALL
Select 5 ID, 654 parent_Id, 2 list_index UNION ALL
Select 7 ID, 321 parent_Id, 1 list_index UNION ALL
Select 8 ID, 321 parent_Id, 2 list_index UNION ALL
Select 9 ID, 321 parent_Id, 4 list_index UNION ALL
Select 10 ID, 321 parent_Id, 5 list_index),
CTE2 as (
SELECT A.*, row_number() over (partition by parent_ID order by list_index)-1 RN
  FROM CTE A)

SELECT Parent_ID, count(*) from CTE2 where Rn <> List_index
GROUP BY Parent_ID

Giving us:

+-----------+----------+
| Parent_ID | count(*) |
+-----------+----------+
|       321 |        4 |
|       654 |        1 |
+-----------+----------+

Please signup or login to give your own answer.

Click here to cancel reply.