I have a table that contains a customer_id, the product they purchased, the timestamp of the purchase, and the rank of the purchase order. I want to know if there is a way in SQL to get overall percentages for the different product combination ordering. Products can be purchased multiple times. Here are some example questions:
What percent of the time is Product B purchased after Product A?
What percent of the time is Product C purchased after Product A?
What percent of the time is Product C purchased after Product B?
I know I could write a bunch of case statements for the different permutations using PRODUCT
and RANK
columns but was curious if there was a more iterative type way in SQL. Also, I know it would be easier in a scripting language like Python but was curious about SQL.
CUST_ID PRODUCT DATE RANK
1000000 Product A 2019-07-30 1
1000000 Product B 2019-11-27 2
1000000 Product B 2020-08-15 3
1000000 Product C 2020-11-07 4
1000001 Product A 2019-05-21 1
1000001 Product B 2019-08-26 2
1000001 Product C 2020-06-26 3
1000002 Product A 2020-06-28 1
1000002 Product A 2020-10-30 2
1000002 Product B 2020-11-23 3
1000003 Product C 2021-07-03 1
1000004 Product C 2021-07-04 2
2
Answers
I’ve had this same problem. I did end up using Python but would love to see if it can be done in SQL.
You can use SQL self-joins to compute these percentages. Here’s a query to answer your example questions:
Replace your_table with the actual table name in your database. This query first computes the count of each product combination (e.g., Product A followed by Product B, Product A followed by Product C, etc.) and then calculates the overall percentage for each combination. The percentages should answer the questions you provided, such as "What percent of the time is Product B purchased after Product A?".