skip to Main Content

I’m trying to group the rows of a table based on the time difference between them.

If the records differ by less than 30 days (time interval), they must be grouped together; otherwise, they are considered separate events. The start will be equal to the start date of the first record, and the end will be equal to the end date of the last record.

For example: The following table

id start end
1 2023-08-10 2023-09-01
1 2023-09-20 2023-10-14
1 2023-11-20 2023-12-27
2 2023-08-15 2023-09-05
2 2023-09-20 2023-10-14
2 2023-11-01 2023-12-20

should become like this:

id start end
1 2023-08-10 2023-10-14
1 2023-11-20 2023-12-27
2 2023-08-15 2023-12-20

For the first case, the first two records should be grouped together as:

start_2n_row – end_1st_row = 2023-09-20 – 2023-09-01 = 19 days

though the 3rd record should be considered seperate event.

I tried to test the LAG but the grouping was not working properly.

2

Answers


  1. Try this query, it groups rows in a PostgreSQL table according to a 30-day time period using recursion. For each group inside of each id, it determines the earliest start date and most recent end date. When using the query, substitute your_table for the name of your table.

    WITH RECURSIVE interval_groups AS (
      SELECT id, start, end, start AS group_start, end AS group_end, 0 AS group_num
      FROM your_table
      WHERE NOT EXISTS (
        SELECT 1
        FROM your_table prev
        WHERE prev.id = your_table.id
          AND prev.end >= your_table.start - INTERVAL '30 days'
          AND prev.start <= your_table.start
      )
      
      UNION ALL
      
      SELECT t.id, t.start, t.end,
             ig.group_start,
             CASE WHEN t.end > ig.group_end THEN t.end ELSE ig.group_end END,
             ig.group_num
      FROM your_table t
      JOIN interval_groups ig ON t.id = ig.id
        AND t.start <= ig.group_end + INTERVAL '30 days'
        AND t.end >= ig.group_start
    )
    SELECT id, MIN(group_start) AS start, MAX(group_end) AS end
    FROM interval_groups
    GROUP BY id, group_num
    ORDER BY id, start;
    

    Hope it works 🙂

    Login or Signup to reply.
  2. try this

    WITH RECURSIVE GroupCTE AS (
    SELECT
        id,
        start,
        end,
        1 AS group_id
    FROM your_table
    ORDER BY id, start
    
    UNION ALL
    
    SELECT
        t.id,
        t.start,
        t.end,
        CASE
            WHEN t.start - lag(g.end) OVER (PARTITION BY t.id ORDER BY t.start) 
      <= interval '30 days'
            THEN g.group_id
            ELSE g.group_id + 1
        END AS group_id
    FROM GroupCTE g
    JOIN your_table t ON g.id = t.id
    WHERE t.start >= g.start
    )
    SELECT
    id,
    MIN(start) AS start,
    MAX(end) AS end
    FROM GroupCTE
    GROUP BY id, group_id
    ORDER BY id, start;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search