I’ve a sample dataset
order_num order_date email product_id
101 2023-06-01 [email protected] 1
101 2023-06-01 [email protected] 2
222 2023-06-02 [email protected] 1
333 2023-06-03 [email protected] 3
434 2023-06-05 [email protected] 3
444 2023-06-01 [email protected] 1
444 2023-06-01 [email protected] 2
677 2023-06-02 [email protected] 3
To purchase the product_id’s 1, 2 and 3, for user [email protected]
, it took 3 orders (101, 222, 333) and it took 2 orders for user [email protected]
.
end table:
email orders_to_cover_all_products
[email protected] 3
[email protected] 2
how can this be achieved using sql?
Query I tried:
select email max(_row_number) from (
select *, row_number() over (partition by email, product_id order by order_num) _row_number from orders) where _row_number <=3 group by email;
2
Answers
One way to solve it is:
Starting from the first order of each client, you want to count how many orders it took for all three products to be purchased at least once.
One approach uses window functions to aggregate the products and orders of each client in arrays. We can then filter on rows on the product list. Since
distinct
is not supported in window functions in Postgres, an additional step is needed to unnest the array of orders and count the distinct values:fiddle