I have a table where I store timeseries data:
customer_id | transaction_type | transaction_date | transaction_value |
---|---|---|---|
1 | buy | 2022-12-04 | 100.0 |
1 | sell | 2022-12-04 | 80.0 |
2 | buy | 2022-12-04 | 120.0 |
2 | sell | 2022-12-03 | 120.0 |
1 | buy | 2022-12-02 | 90.0 |
1 | sell | 2022-12-02 | 70.0 |
2 | buy | 2022-12-01 | 110.0 |
2 | sell | 2022-12-01 | 110.0 |
Number of customers and transaction types is not limited. Currently there are over 10,000 customers and over 600 transaction types. Dates of transactions ~between customers can be unique and~ will not always align based on any criteria among a customer or transaction type (that’s why I’ve tried using LATERAL JOIN
— you’ll see it later).
I want to filter those records to get customers IDs with the values of the transaction where any arbitrary condition is met. Number of those conditions in a query is not restricted to two — can be anything. For example:
Give me all customers who have a buy with value > $90 and a sale with value > 100$ as their latest transactions
The final query should return these two rows:
customer_id | transaction_type | transaction_date | transaction_value |
---|---|---|---|
2 | buy | 2022-12-04 | 120$ |
2 | sell | 2022-12-03 | 120$ |
The closest I’ve came to what I need was by creating a materialized view cross-joining customer IDs and transaction_types:
customer_id | transaction_type |
---|---|
1 | buy |
1 | sell |
2 | buy |
2 | sell |
And then running a LATERAL JOIN
between table with transactions and customer_transactions
materialized view:
SELECT *
FROM customer_transactions
JOIN LATERAL (
SELECT *
FROM transactions
WHERE (transactions.customer_id = customer_transactions.customer_id)
AND (transactions.transaction_type = customer_transactions.transaction_type)
AND transactions.transaction_date <= '2022-12-04' -- this can change for filtering records back in time
ORDER BY transactions.transaction_date DESC
LIMIT 1
) transactions ON TRUE
WHERE customer_transactions.transaction_type = 'buy'
AND customer_transactions.transaction_value > 90
It seems to be working when one condition is specified. But as soon as subsequential conditions are introduced that’s where things start falling apart for me; changing condition to:
WHERE (customer_transactions.transaction_type = 'buy'
AND customer_transactions.transaction_value > 90)
AND (customer_transactions.transaction_type = 'sell'
AND customer_transactions.transaction_value > 100)
is obviously not going to work as there is no row that satisfies both of these conditions.
Is it possible to achieve this using the aproach I took? If so what am I missing? Or maybe there is another way to solve that would be more appropriate?
2
Answers
You could use a CTE with row_number and chech out the last transactios
fiddle
Use distinct on with custom order to select all the latest transactions per customer according to your several criteria (hence the
OR
) –latest
CTE, then count the number of result records per user usingcount
as a window function –latest_with_count
CTE – and finally pick these that have a count equal to the number of criteria, i.e. all the criteria are honoured.This may be a bit verbose and abstract template but hopefully would help with the generic problem. The idea would work for any number of conditions.