I’m working with the following tables in a Postgres database:
card_purchases
purchaser_id purchase_date card_id
44 12/10/2021 3
3 1/27/2022 1
19 1/31/2022 2
22 2/15/2022 1
4 6/9/2022 4
17 8/20/2022 2
19 2/4/2023 2
22 3/17/2023 1
3 3/19/2023 2
747 3/24/2023 2
1193 4/14/2023 1
card_templates
card_id card_name
1 discount pass
2 exclusive pass
3 senior citizen discount
4 customer loyalty
discounts
discount_id discount_name
1001 goodwill applied
1002 discount pass applied
1003 exclusive pass applied
1004 exclusive pass applied
1005 discount pass applied
1006 exclusive pass applied
1007 exclusive pass applied
1008 discount pass applied
1009 exclusive pass applied
1010 exclusive pass applied
1011 discount pass applied
1012 exclusive pass applied
1013 exclusive pass applied
1014 exclusive pass applied
1015 exclusive pass applied
1016 discount pass applied
discount_orders
order_id discount_id created_at purchaser_id
1100 1002 1/28/2022 3
1101 1003 1/31/2022 19
1102 1004 2/4/2022 19
1103 1005 3/15/2022 22
1104 1006 3/17/2022 19
1105 1007 8/27/2022 17
1106 1008 8/30/2022 22
1107 1009 2/4/2023 19
1108 1010 2/19/2023 19
1109 1011 3/18/2023 22
1110 1012 3/19/2023 19
1111 1013 3/31/2023 747
1112 1014 4/5/2023 19
1113 1015 4/15/2023 747
1114 1016 4/20/2023 1193
Here is a SQL Fiddle with all of the data: http://sqlfiddle.com/#!17/e64c2/1
By way of some background, our customers buy "cards" (i.e. passes), which give them the right to purchase our products at a discounted price. The cards are typically valid for 1 year, at which point customers can purchase another card.
My goal is to see the number of purchases that each customer has made after they most recently purchased a card. The problem is that I don’t know how to compute the count if a customer has purchased more than one card.
For example, customer 19 purchased two cards: on 1/31/22 and then again on 2/4/23. This customer used the second card a total of 4 times.
SELECT
dord.purchaser_id AS user_id
, COUNT(DISTINCT(dord.order_id)) AS count_orders
FROM
discounts d
JOIN
discount_orders dord
ON d.discount_id = dord.discount_id
WHERE
d.discount_name LIKE '%pass%'
AND dord.purchaser_id IN (
SELECT
cp.purchaser_id
FROM
card_purchases cp
JOIN
card_templates ct
ON cp.card_id = ct.card_id
WHERE
ct.card_id IN (1, 2)
)
GROUP BY
dord.purchaser_id
This code is outputting a count of 7 purchases for customer 19 (which is correct). However, I’m only interested in seeing purchased made on or after the purchase of the most recent card. So, for customer 19, this would be 4 purchases (because the 2nd card was purchased on 2/4/23 and there were 4 purchases made on or after this date).
How do we fix this code such that the count is only applied to orders placed on or after the customer’s most recent purchase of a card?
3
Answers
You can use row_number() window function to select only the last card purchase of the customers.
Query:
Output
| user_id | count_orders |
|——–:|————-:|
| 17 | 1 |
| 19 | 4 |
| 22 | 1 |
| 747 | 2 |
| 1193 | 1 |
fiddle
The overall query logic looks good, we just need to adapt the subquery filtering. Instead of
IN
, we can use a correlated subquery that returns the date of the latest card purchase of the current purchaser:Note that, for the data that you have shown, there is no need to use
distinct
. Also the join oncards_template
in the subquery seems unnecessary.We can also express the logic with a join:
This is a perfect opportunity to apply a lateral join. Find the card purchases of interest and then query for the number of orders. In this instance a scalar subquery also works since there’s only a single value to return. The lateral join would allow for other columns to be added.
Equivalent to:
I removed the filter on the discount name as I don’t believe it’s necessary here.
https://dbfiddle.uk/5QN_gb2s?hide=4