skip to Main Content

There is a table of customers with purchases made in a certain month. It happens that a customer did not buy anything in a certain month, so this month is skipped.

Task:
I need to extract only those lines where the customer has been shopping continuously every month.

I tried to create a flag, but I don’t understand how I can remove the first row of the table. It turned out to be done using group and having, but I would like to see a shorter query.

The DBMS used is PostgreSQL.

with cte as (
    select 
        client_id, 
        date_trunc('month', dt)::date as m_date, 
        sum(liters) as s_lit
    from azs 
    group by client_id, m_date
    order by 1, 2
), 
cte2 as (
    SELECT client_id, m_date, 
        CASE WHEN DATE_PART('month', m_date) - DATE_PART('month', lag(m_date) over (partition by client_id order by m_date)) IS NULL THEN 1       
             WHEN DATE_PART('month', m_date) - DATE_PART('month', lag(m_date) over (partition by client_id order by m_date)) > 1 THEN 1
             ELSE 0
        END AS flag
    FROM cte
)
    
select * from cte2

2

Answers


  1. You can use a common table expression (CTE) to first group the data by client and month, then use a window function to identify any gaps in monthly purchases and then filter out any customers with gaps. Try this

    WITH monthly_purchases AS (
        SELECT 
            client_id, 
            date_trunc('month', dt)::date AS m_date,
            sum(liters) AS s_lit
        FROM azs
        GROUP BY client_id, m_date
    ),
    gaps AS (
        SELECT 
            client_id,
            m_date,
            lag(m_date) OVER (PARTITION BY client_id ORDER BY m_date) AS prev_m_date,
            date_part('month', m_date) - date_part('month', lag(m_date) OVER (PARTITION BY client_id ORDER BY m_date)) AS month_diff,
            date_part('year', m_date) - date_part('year', lag(m_date) OVER (PARTITION BY client_id ORDER BY m_date)) AS year_diff
        FROM monthly_purchases
    ),
    continuous_customers AS (
        SELECT 
            client_id
        FROM gaps
        WHERE (year_diff = 0 AND month_diff = 1) OR (year_diff = 1 AND month_diff = -11)
        GROUP BY client_id
        HAVING count(*) = count(m_date)
    )
    SELECT 
        mp.client_id, 
        mp.m_date, 
        mp.s_lit
    FROM 
        monthly_purchases mp
    JOIN 
        continuous_customers cc ON mp.client_id = cc.client_id
    ORDER BY 
        mp.client_id, 
        mp.m_date;
    
    Login or Signup to reply.
  2. It turned out to be done using group and having, but I would like to
    see a shorter query.

    I believe it will be hard to avoid a group-by for this task, but it can be done with a single WITH-clause before the group-by.

    In the below query, client_id=1 is missing orders in March, client_id=3 is missing orders in February, and client_id=4 has a single order in March. Hence expected output is client_id 2 and 4.

    Query is testet with Postgres playground.

    WITH azs AS (
        SELECT 1 AS client_id, DATE '2023-11-15' AS dt
        UNION ALL
        SELECT 1, DATE '2023-12-10'
        UNION ALL
        SELECT 1, DATE '2024-01-12'
        UNION ALL
        SELECT 1, DATE '2024-02-20'
        UNION ALL
        SELECT 1, DATE '2024-04-10' -- Missing March
        UNION ALL
        SELECT 2, DATE '2023-11-12'
        UNION ALL
        SELECT 2, DATE '2023-12-11'
        UNION ALL
        SELECT 2, DATE '2024-01-15'
        UNION ALL
        SELECT 2, DATE '2024-02-16'
        UNION ALL
        SELECT 3, DATE '2024-01-20'
        UNION ALL
        SELECT 3, DATE '2024-03-20'  -- Missing February
        UNION ALL
        SELECT 4, DATE '2024-03-21'   -- Single month
    ),
    client_id_month_and_prev_month AS (
        SELECT 
            client_id, 
            DATE_TRUNC('month', dt) AS m_date,
            LAG(DATE_TRUNC('month', dt)) OVER (PARTITION BY client_id ORDER BY DATE_TRUNC('month', dt)) AS prev_m_date
        FROM azs
        GROUP BY 1,2
    )
    
    SELECT 
        client_id
    FROM 
        client_id_month_and_prev_month
    GROUP BY 1
    HAVING 
        COUNT(client_id) = 1
        OR
        MAX(
            DATE_PART('month', AGE(m_date, prev_m_date))
        ) = 1;
    

    Output

    ┌───────────┐
    │ client_id │
    ├───────────┤
    │         2 │
    │         4 │
    └───────────┘
    (2 rows)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search