Postgresql - The problem of extracting consecutive dates in SQL query

anonymous
July 23, 2024
215 views
1 vote
2 Answers

There is a table of customers with purchases made in a certain month. It happens that a customer did not buy anything in a certain month, so this month is skipped.

Task:
I need to extract only those lines where the customer has been shopping continuously every month.

I tried to create a flag, but I don’t understand how I can remove the first row of the table. It turned out to be done using group and having, but I would like to see a shorter query.

The DBMS used is PostgreSQL.

with cte as (
    select 
        client_id, 
        date_trunc('month', dt)::date as m_date, 
        sum(liters) as s_lit
    from azs 
    group by client_id, m_date
    order by 1, 2
), 
cte2 as (
    SELECT client_id, m_date, 
        CASE WHEN DATE_PART('month', m_date) - DATE_PART('month', lag(m_date) over (partition by client_id order by m_date)) IS NULL THEN 1       
             WHEN DATE_PART('month', m_date) - DATE_PART('month', lag(m_date) over (partition by client_id order by m_date)) > 1 THEN 1
             ELSE 0
        END AS flag
    FROM cte
)
    
select * from cte2

Answers

You can use a common table expression (CTE) to first group the data by client and month, then use a window function to identify any gaps in monthly purchases and then filter out any customers with gaps. Try this

WITH monthly_purchases AS (
    SELECT 
        client_id, 
        date_trunc('month', dt)::date AS m_date,
        sum(liters) AS s_lit
    FROM azs
    GROUP BY client_id, m_date
),
gaps AS (
    SELECT 
        client_id,
        m_date,
        lag(m_date) OVER (PARTITION BY client_id ORDER BY m_date) AS prev_m_date,
        date_part('month', m_date) - date_part('month', lag(m_date) OVER (PARTITION BY client_id ORDER BY m_date)) AS month_diff,
        date_part('year', m_date) - date_part('year', lag(m_date) OVER (PARTITION BY client_id ORDER BY m_date)) AS year_diff
    FROM monthly_purchases
),
continuous_customers AS (
    SELECT 
        client_id
    FROM gaps
    WHERE (year_diff = 0 AND month_diff = 1) OR (year_diff = 1 AND month_diff = -11)
    GROUP BY client_id
    HAVING count(*) = count(m_date)
)
SELECT 
    mp.client_id, 
    mp.m_date, 
    mp.s_lit
FROM 
    monthly_purchases mp
JOIN 
    continuous_customers cc ON mp.client_id = cc.client_id
ORDER BY 
    mp.client_id, 
    mp.m_date;

It turned out to be done using group and having, but I would like to
see a shorter query.

I believe it will be hard to avoid a group-by for this task, but it can be done with a single WITH-clause before the group-by.

In the below query, client_id=1 is missing orders in March, client_id=3 is missing orders in February, and client_id=4 has a single order in March. Hence expected output is client_id 2 and 4.

Query is testet with Postgres playground.

WITH azs AS (
    SELECT 1 AS client_id, DATE '2023-11-15' AS dt
    UNION ALL
    SELECT 1, DATE '2023-12-10'
    UNION ALL
    SELECT 1, DATE '2024-01-12'
    UNION ALL
    SELECT 1, DATE '2024-02-20'
    UNION ALL
    SELECT 1, DATE '2024-04-10' -- Missing March
    UNION ALL
    SELECT 2, DATE '2023-11-12'
    UNION ALL
    SELECT 2, DATE '2023-12-11'
    UNION ALL
    SELECT 2, DATE '2024-01-15'
    UNION ALL
    SELECT 2, DATE '2024-02-16'
    UNION ALL
    SELECT 3, DATE '2024-01-20'
    UNION ALL
    SELECT 3, DATE '2024-03-20'  -- Missing February
    UNION ALL
    SELECT 4, DATE '2024-03-21'   -- Single month
),
client_id_month_and_prev_month AS (
    SELECT 
        client_id, 
        DATE_TRUNC('month', dt) AS m_date,
        LAG(DATE_TRUNC('month', dt)) OVER (PARTITION BY client_id ORDER BY DATE_TRUNC('month', dt)) AS prev_m_date
    FROM azs
    GROUP BY 1,2
)

SELECT 
    client_id
FROM 
    client_id_month_and_prev_month
GROUP BY 1
HAVING 
    COUNT(client_id) = 1
    OR
    MAX(
        DATE_PART('month', AGE(m_date, prev_m_date))
    ) = 1;

Output

┌───────────┐
│ client_id │
├───────────┤
│         2 │
│         4 │
└───────────┘
(2 rows)

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – The problem of extracting consecutive dates in SQL query

Answers