skip to Main Content

Please check below as a concrete example.

For each id, there are several time intervals. I want to count the number of overlapping intervals, which is defined as the start_time of the current interval < the end_time of ALL the previous intervals.
For example, for id=1,

     3.5<4 (the previous end_time)
     3.8<4, 3.8<4.5
     3.9<4, 3.9<4.5, 3.9<4.8

There are 6 overlapping intervals, while for id=2, since 4.5 > 4, the number of overlapping intervals is ZERO.

id    start_time   end_time
1        3           4
1       3.5         4.5
1       3.8         4.8
1       3.9          5
2       2            4
2       4.5          5
...

I was planning to use LAG() function in MySQL, however, I realized I only could compare the current row with the previous X rows using the offset argument. Is there any good solution to solve this problem in MySQL? Thank you.

2

Answers


  1. Try this:

    select distinct(a.id) from your_table a
    inner join (
    select 
    id, 
    max(start_time) as maxStartTime,
    min(end_time) as minEndTime
    from your_table where start_time < end_time group by id) b on a.id = b.id
    where b.maxStartTime < b.minEndTime
    
    Login or Signup to reply.
  2. Possible algorithm. Combine starts and ends to one rowset, and add a weight, +1 for starts and -1 for ends. Then calculate cumulative sums for these weights. The amount of rows with cumulative sum of zero is the output which you need in.

    Take a look at this assignment from the following perspective. Imagine a room. A person enters this room at the start_time and leaves the room at the end_time. During overlapping time intervals, there are several people in the room, one or more. The room is empty between the overlapping ranges.

    WITH 
    cte1 (id, timepoint, weight) AS (
      SELECT id, start_time, 1 FROM test
      UNION ALL
      SELECT id, end_time, -1 FROM test
      ),
    cte2 AS (
      SELECT id, timepoint, SUM(weight) OVER (PARTITION BY id ORDER BY timepoint) cumsum
      FROM cte1
      )
    SELECT id, COUNT(DISTINCT CASE WHEN cumsum = 0 THEN timepoint END) amount
    FROM cte2
    GROUP BY 1;
    

    COUNT(DISTINCT) in the outer query is needed in order to remove duplicates by the end_time value.

    Perhaps you need to interpret adjacent intervals not as combined, but as separate intervals. I.e. 1-3 and 3-5 are 2 intervals, not one continuous interval. In this case, just adjust the end_time in cte1 by subtracting some small time value from it (less than the time accuracy/granularity).

    fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search