This is a slightly more complicated reporting problem than I usually have. I want to find how many times customers have repurchased individual product listings. (Say, some products have really high sales, but we want to know how much are from first-time purchases and how much are repurchases.)
items_purchased:
+----------+------------+----------+
| buyer_id | listing_id | quantity |
+----------+------------+----------+
| 1234 | 5678 | 1 |
| 1234 | 3456 | 1 |
| 9012 | 3456 | 2 |
| 7901 | 5678 | 1 |
| 1234 | 5678 | 1 |
+----------+------------+----------+
From that data, I can see listing 5678
has been purchased 3 times but repurchased once. Listing 3456
has been purchased twice but never repurchased. I want to build a report like:
+------------+-----------+-------------+-----------------+
| listing_id | purchases | repurchases | repurchase_rate |
+------------+-----------+-------------+-----------------+
| 5678 | 1364 | 152 | 11% |
| 3456 | 892 | 289 | 32% |
| 3284 | 51 | 30 | 59% |
| 8763 | 8 | 0 | 0% |
+------------+-----------+-------------+-----------------+
I think I need a DISTINCT
or GROUP BY
somewhere, definitely a SUM
I’m guessing, but I’m stumped how to do it. (The "repurchase_rate" I know how to do.)
4
Answers
Off the top of my head, you could build a CTE that summarizes the count of purchases of a particular
listing_id
bybuyer_id
, subtracting 1 from each to include only repurchases:DB Fiddle
This is probably not the most computationally efficient way to do this considering the somewhat gratuitous use of subqueries, but I’m confident there’s room for it to be optimized further to accommodate scaled usage.
This query does the following:
first_purchases
identifies the first purchase for eachbuyer_id
andlisting_id
items_purchased
table twice: once to count all purchases (p
) and once to count repurchases (r
)repurchases
count is filtered to only include purchases where thequantity
is greater than thefirst_purchase
quantityrepurchase_rate
and formats the outputPlease note that the above query assumes that a repurchase is defined as any subsequent purchase where the quantity is greater than the first purchase quantity. Adjust the logic as needed for your specific definition of a repurchase.
You can do:
Result:
See running example at db<>fiddle.
Use
count()
andcount() - count(distinct)
to count purchases and repurchases, respectively.db fiddle