skip to Main Content

I’ve a sample dataset

order_num    order_date    email             product_id
    101        2023-06-01   [email protected]            1
    101        2023-06-01   [email protected]            2
    222        2023-06-02   [email protected]            1
    333        2023-06-03   [email protected]            3
    434        2023-06-05   [email protected]            3
    444        2023-06-01   [email protected]            1
    444        2023-06-01   [email protected]            2
    677        2023-06-02   [email protected]            3

To purchase the product_id’s 1, 2 and 3, for user [email protected], it took 3 orders (101, 222, 333) and it took 2 orders for user [email protected].

end table:

email               orders_to_cover_all_products
[email protected]                   3
[email protected]                   2

how can this be achieved using sql?

Query I tried:

select email max(_row_number) from (
select *, row_number() over (partition by email, product_id order by order_num) _row_number from orders) where _row_number <=3 group by email;

2

Answers


  1. One way to solve it is:

    with your_table (order_num, order_date, email, product_id) as (
    
        VALUES
            (101, '2023-06-01', '[email protected]', 1),
            (101, '2023-06-01', '[email protected]', 2),
            (222, '2023-06-02', '[email protected]', 1),
            (333, '2023-06-03', '[email protected]', 3),
            (434, '2023-06-05', '[email protected]', 3),
            (444, '2023-06-01', '[email protected]', 1),
            (444, '2023-06-01', '[email protected]', 2),
            (677, '2023-06-02', '[email protected]', 3)
    ),
    meta as (
    select
        email,
        product_id,
        min(order_date) order_date
    from
        your_table
    group by
        email,
        product_id
    ),
    maxMeta as (
    select
        email,
        max(order_date) last_date_All
    from
        meta
    group by
        email
    
    )
    
    select
        o.email,
        count(distinct order_num) orders_to_cover_all_products
    from
        your_table o
    join maxMeta m on
        m.email = o.email
        and o.order_date <= last_date_all
    group by
        o.email
    
    Login or Signup to reply.
  2. Starting from the first order of each client, you want to count how many orders it took for all three products to be purchased at least once.

    One approach uses window functions to aggregate the products and orders of each client in arrays. We can then filter on rows on the product list. Since distinct is not supported in window functions in Postgres, an additional step is needed to unnest the array of orders and count the distinct values:

    select o.email, min(x.cnt_orders) cnt_orders
    from (
        select o.*,
            array_agg(product_id) over(partition by email order by order_date, order_num) product_ar,
            array_agg(order_num ) over(partition by email order by order_date, order_num) order_ar
        from orders o
    ) o
    cross join lateral (
        select count(distinct x.order_num) as cnt_orders from unnest(o.order_ar) as x(order_num)
    ) x
    where array[1, 2, 3] <@ product_ar
    group by o.email
    
    email cnt_orders
    [email protected] 2
    [email protected] 3

    fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search