I’m working with a Postgres table named orders
that looks like this:
user_id product order_date
1 pants 7/1/2022
2 shirt 6/1/2022
1 socks 3/17/2023
3 pants 2/17/2023
4 shirt 3/13/2023
2 pants 8/15/2022
1 hat 4/15/2022
5 hat 3/14/2023
2 socks 12/3/2022
3 shirt 4/15/2023
4 socks 1/15/2023
4 pants 4/19/2023
5 shirt 5/2/2023
5 belt 5/15/2023
Here is a dB Fiddle with the data: https://www.db-fiddle.com/f/uNGjP7gpKwdPGrJ7XmT7k3/2
I output a table that shows the sequence of a customer’s orders:
user_id first_order second_order third_order
1 hat pants socks
2 shirt pants socks
3 pants shirt <null>
4 socks shirt pants
5 hat shirt belt
So, for example, customer 1 first purchased a hat, then purchased pants, and finally purchased socks.
I’d like to set some sort of indicator at the row level that tells me whether a particular customer purchased one product before they purchased another product. For example, I’d like to indicate whether a customer purchased a shirt before they purchased pants.
The desired output would look like this:
user_id first_order second_order third_order shirt_before_pants
1 hat pants socks false
2 shirt pants socks true
3 pants shirt <null> false
4 socks shirt pants true
5 hat shirt belt false
Is there a way to get the relative position of a given value at the row level?
Thanks for your help,
-Rachel
2
Answers
We can enumerate the orders of each customer with
row_number()
, then use conditional aggregation to generate the new columns. To check if a product was bought before another, we can compare the minimum order date of both products:This method uses the window function
ROW_NUMBER
(DENSE_RANK also can work), which assigns a row number to each row aggregated by the user_id. To determine whether the shirt was purchased before the pants, we can compare the generated row_ids of those products :