skip to Main Content

In a trading platform, users have three different actions: LOGIN, SEARCH, ORDER.

Abandoned SEARCH action is defined as: customers LOGIN then do some SEARCH and do not ORDER before next LOGIN.
Now I have a table: action_table, recording customer_id, action, request_time.

I want to know how to find all abandoned SEARCH actions via SQL?


Updated:
Here is a brief example:

CREATE TABLE action_table(
   customer_id  VARCHAR(1) NOT NULL
  ,action       VARCHAR(6) NOT NULL
  ,request_time DATE  NOT NULL
);
INSERT INTO action_table(customer_id,action,request_time) 
VALUES 
('A','LOGIN','2023-05-01'),
('A','SEARCH','2023-05-02'),
('A','SEARCH','2023-05-03'),
('A','ORDER','2023-05-04'),
('B','LOGIN','2023-05-01'),
('B','SEARCH','2023-05-02'),
('B','SEARCH','2023-05-03'),
('B','LOGIN','2023-05-04'),
('B','SEARCH','2023-05-05')
customer_id action request_time
A LOGIN 2023-05-01
A SEARCH 2023-05-02
A SEARCH 2023-05-03
A ORDER 2023-05-04
B LOGIN 2023-05-01
B SEARCH 2023-05-02
B SEARCH 2023-05-03
B LOGIN 2023-05-04
B SEARCH 2023-05-05

In this case, NO abandoned SEARCH for A and 3 abandoned SEARCH for B.

Here is my code:

select customer_id, count(1)
from action_table c1
left join 
(
  select customer_id, action, request_time 
  from action_table
  where action = 'LOGIN'
) c2
  on c1.customer_id = c2.customer_id
  and c2.request_time > c1.request_time
left join 
(
  select customer_id, action, request_time 
  from action_table
  where action = 'ORDER'
) c3
  on c1.customer_id = c3.customer_id
  and c3.request_time > c1.request_time
  and c3.request_time < c2.request_time
where c1.action = 'SEARCH'
  and c2.customer_id IS NOT NULL
  and c3.customer_id IS NULL
group by 1

It is redundant and does not seem to work well 🙁

2

Answers


  1. Chosen as BEST ANSWER

    Here is my solution. This is not an optimal one but still works.

    The basic approach is to find the most recent LOGIN action after each SEARCH. Then check if ORDER exists between search time and most recent login time. Two subqueries are used :(

    SELECT *
    FROM action_table AS a
    WHERE a.action = 'SEARCH'
      AND NOT EXISTS (
        SELECT 1
        FROM action_table
        WHERE customer_id = a.customer_id
          AND action = 'ORDER'
          AND request_time > a.request_time
          AND request_time < (
            SELECT MIN(request_time)
            FROM action_table
            WHERE customer_id = a.customer_id
              AND action = 'LOGIN'
              AND request_time > a.request_time
          )
    );
    

  2. One option uses LEAD; the idea is to check the two following actions of each login event to identify abandonned searches:

    select customer_id, count(*) cnt_abandonned_search
    from (
        select a.*, 
            lead(action, 1) over(partition by customer_id order by request_time) lead_action_1,
            lead(action, 2) over(partition by customer_id order by request_time) lead_action_2
        from action_table a
    ) a
    where action = 'LOGIN'                         -- a LOGIN action...
      and lead_action_1 = 'SEARCH'                 -- ... followed by a SEARCH
      and lead_action_2 is distinct from 'ORDER'   -- ... but not followed by an ORDER
    group by customer_id
    

    If you wanted to focus on searches, and get the entire record of each abandoned search, we can use the same logic but with lead and lag:

    select *
    from (
        select a.*, 
            lag (action) over(partition by customer_id order by request_time) lag_action,
            lead(action) over(partition by customer_id order by request_time) lead_action
        from action_table a
    ) a
    where action = 'SEARCH'                    -- a SEARCH action...
      and lag_action = 'LOGIN'                 -- ... preceded by a LOGIN
      and lead_action is distinct from 'ORDER' -- ... but not followed by an ORDER
    

    Demo on DB Fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search