skip to Main Content

I have a table where I store timeseries data:

customer_id transaction_type transaction_date transaction_value
1 buy 2022-12-04 100.0
1 sell 2022-12-04 80.0
2 buy 2022-12-04 120.0
2 sell 2022-12-03 120.0
1 buy 2022-12-02 90.0
1 sell 2022-12-02 70.0
2 buy 2022-12-01 110.0
2 sell 2022-12-01 110.0

Number of customers and transaction types is not limited. Currently there are over 10,000 customers and over 600 transaction types. Dates of transactions ~between customers can be unique and~ will not always align based on any criteria among a customer or transaction type (that’s why I’ve tried using LATERAL JOIN — you’ll see it later).

I want to filter those records to get customers IDs with the values of the transaction where any arbitrary condition is met. Number of those conditions in a query is not restricted to two — can be anything. For example:

Give me all customers who have a buy with value > $90 and a sale with value > 100$ as their latest transactions

The final query should return these two rows:

customer_id transaction_type transaction_date transaction_value
2 buy 2022-12-04 120$
2 sell 2022-12-03 120$

The closest I’ve came to what I need was by creating a materialized view cross-joining customer IDs and transaction_types:

customer_id transaction_type
1 buy
1 sell
2 buy
2 sell

And then running a LATERAL JOIN between table with transactions and customer_transactions materialized view:

SELECT *
  FROM customer_transactions
  JOIN LATERAL (
    SELECT *
      FROM transactions
     WHERE (transactions.customer_id = customer_transactions.customer_id)
       AND (transactions.transaction_type = customer_transactions.transaction_type)
       AND transactions.transaction_date <= '2022-12-04' -- this can change for filtering records back in time
     ORDER BY transactions.transaction_date DESC
     LIMIT 1
  ) transactions ON TRUE
 WHERE customer_transactions.transaction_type = 'buy'
   AND customer_transactions.transaction_value > 90

It seems to be working when one condition is specified. But as soon as subsequential conditions are introduced that’s where things start falling apart for me; changing condition to:

 WHERE (customer_transactions.transaction_type = 'buy'
   AND customer_transactions.transaction_value > 90)
   AND (customer_transactions.transaction_type = 'sell'
   AND customer_transactions.transaction_value > 100)

is obviously not going to work as there is no row that satisfies both of these conditions.

Is it possible to achieve this using the aproach I took? If so what am I missing? Or maybe there is another way to solve that would be more appropriate?

2

Answers


  1. You could use a CTE with row_number and chech out the last transactios

    WITH CTE as (SELECT
    "customer_id", "transaction_type", "transaction_date",
      "transaction_value",
    ROW_NUMBER() OVER(PARTITION BY "customer_id", "transaction_type" ORDER BY  "transaction_date" DESC) rn
    FROM tab1)
    SELECT "customer_id", "transaction_type", "transaction_date",
      "transaction_value" FROM CTE
      WHERE rn = 1 
      AND CASE WHEN "transaction_type" = 'buy' THEN ("transaction_value" > 90) 
    WHEN "transaction_type" = 'sell' THEN ("transaction_value" > 100) 
    ELSE FALSE END 
    AND (SELECT COUNT(*) FROM CTE c1 
      WHERE c1."customer_id"= CTE."customer_id" and rn = 1
        AND CASE WHEN "transaction_type" = 'buy' THEN ("transaction_value" > 90) 
    WHEN "transaction_type" = 'sell' THEN ("transaction_value" > 100) 
    ELSE FALSE END ) = 2
    
    customer_id transaction_type transaction_date transaction_value
    2 buy 2022-12-04 120.0
    2 sell 2022-12-03 120.0
    SELECT 2
    

    fiddle

    Login or Signup to reply.
  2. Use distinct on with custom order to select all the latest transactions per customer according to your several criteria (hence the OR) – latest CTE, then count the number of result records per user using count as a window function – latest_with_count CTE – and finally pick these that have a count equal to the number of criteria, i.e. all the criteria are honoured.
    This may be a bit verbose and abstract template but hopefully would help with the generic problem. The idea would work for any number of conditions.

    with t as
    (
     /*
      your query here with several conditions in DISJUNCTION (OR) here, i.e.
      WHERE (customer_transactions.transaction_type = 'buy'  AND customer_transactions.transaction_value > 90)
         OR (customer_transactions.transaction_type = 'sell' AND customer_transactions.transaction_value > 100)
     */
    ),
    latest as 
    (
     select distinct on (customer_id, transaction_type) *
     from t
     -- pick the latest per customer & type
     order by customer_id, transaction_type, transaction_date desc
    ),
    latest_with_count as
    (
     select *, count(*) over (partition by customer_id) cnt
     from latest
    )
    select * 
    from latest_with_count
    where cnt = 2 -- the number of criteria
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search