skip to Main Content

I’m working with a Postgres table named orders that looks like this:

user_id   product       order_date
1         pants         7/1/2022
2         shirt         6/1/2022
1         socks         3/17/2023
3         pants         2/17/2023
4         shirt         3/13/2023
2         pants         8/15/2022
1         hat           4/15/2022
5         hat           3/14/2023
2         socks         12/3/2022
3         shirt         4/15/2023
4         socks         1/15/2023
4         pants         4/19/2023
5         shirt         5/2/2023
5         belt          5/15/2023


Here is a dB Fiddle with the data: https://www.db-fiddle.com/f/uNGjP7gpKwdPGrJ7XmT7k3/2

I output a table that shows the sequence of a customer’s orders:

user_id   first_order   second_order    third_order
1         hat           pants           socks
2         shirt         pants           socks
3         pants         shirt           <null>
4         socks         shirt           pants
5         hat           shirt           belt            

So, for example, customer 1 first purchased a hat, then purchased pants, and finally purchased socks.

I’d like to set some sort of indicator at the row level that tells me whether a particular customer purchased one product before they purchased another product. For example, I’d like to indicate whether a customer purchased a shirt before they purchased pants.

The desired output would look like this:

user_id   first_order   second_order    third_order     shirt_before_pants
1         hat           pants           socks           false
2         shirt         pants           socks           true
3         pants         shirt           <null>          false
4         socks         shirt           pants           true
5         hat           shirt           belt            false

Is there a way to get the relative position of a given value at the row level?

Thanks for your help,
-Rachel

2

Answers


  1. We can enumerate the orders of each customer with row_number(), then use conditional aggregation to generate the new columns. To check if a product was bought before another, we can compare the minimum order date of both products:

    select user_id,
        max(product) filter(where rn = 1) product_1,
        max(product) filter(where rn = 2) product_2,
        max(product) filter(where rn = 3) product_3,
        ( 
              min(order_date) filter(where product = 'shirt') 
            < min(order_date) filter(where product = 'pants')
        ) shirt_before_pants
    from (
        select o.*, row_number() over(partition by user_id order by order_date) rn
        from orders o
    ) o
    group by user_id
            
    
    Login or Signup to reply.
  2. This method uses the window function ROW_NUMBER (DENSE_RANK also can work), which assigns a row number to each row aggregated by the user_id. To determine whether the shirt was purchased before the pants, we can compare the generated row_ids of those products :

    With cte as (
      SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_date) AS rn
      FROM orders
    )
    select user_id, max(case when rn = 1 then product end) as first_order,
                    max(case when rn = 2 then product end) as second_order,
                    max(case when rn = 3 then product end) as third_order,
                    MAX(case when product = 'shirt' then rn end) 
                    < MAX(case when product = 'pants' then rn end) as shirt_before_pants
    from cte
    GROUP BY user_id;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search