skip to Main Content

Find new customer_ids for each day in an orders table. This means I want to compare each day with previous days to find the ones existing in that day but not in earlier day.

Table:

order_id order_date customer_id
1 2023-03-27 1
2 2023-03-27 2
3 2023-03-27 3
4 2023-03-27 4
5 2023-03-27 5
6 2023-03-28 1
7 2023-03-28 2
8 2023-03-28 6
9 2023-03-28 7
10 2023-03-28 8
11 2023-03-29 7
12 2023-03-29 8
13 2023-03-29 9
14 2023-03-29 10
15 2023-03-29 11
16 2023-03-30 8
17 2023-03-30 9
18 2023-03-30 12
19 2023-03-30 13
20 2023-03-30 14

SQL TABLE SCHEMA:

CREATE TABLE orders (
  order_id INT NOT NULL,
  order_date DATE NOT NULL,
  customer_id INT NOT NULL,
  PRIMARY KEY (order_id)
);

INSERT INTO orders (order_id, order_date, customer_id)
VALUES
  (1, '2023-03-27', 1),
  (2, '2023-03-27', 2),
  (3, '2023-03-27', 3),
  (4, '2023-03-27', 4),
  (5, '2023-03-27', 5),
  (6, '2023-03-28', 1),
  (7, '2023-03-28', 2),
  (8, '2023-03-28', 6),
  (9, '2023-03-28', 7),
  (10, '2023-03-28', 8),
  (11, '2023-03-29', 7),
  (12, '2023-03-29', 8),
  (13, '2023-03-29', 9),
  (14, '2023-03-29', 10),
  (15, '2023-03-29', 11),
  (16, '2023-03-30', 8),
  (17, '2023-03-30', 9),
  (18, '2023-03-30', 12),
  (19, '2023-03-30', 13),
  (20, '2023-03-30', 14);


Note: Could someone please explain this step by step. Thank you in advance!

Just pretty confused with the logic I should follow for the question.

UPDATE 1:
I ran the query below and got the desired results.

select * from orders curr
where not exists (
  select 1 
  from orders prev_day 
  where prev_day.order_date = curr.order_date-1 
  and prev_day.customer_id = curr.customer_id
  )

I am wondering whether there is also a way to get the same output by a more explicit self-join for a beginner? I tried running only the sub-query and got an error and I am not sure why

2

Answers


  1. Your query:

    select * from orders curr
    where not exists (
      select 1 
      from orders prev_day 
      where prev_day.order_date = curr.order_date-1 
      and prev_day.customer_id = curr.customer_id
      )
    

    For every row of table it’s selecting 1 (just to check if something exists) from the same table, with same customer_id, but order_date 1 less then current.

    If sub-query finds something, current row drops from dataset. If it finds nothing (hence NOT EXISTS) current row is added to result dataset.

    Fiddle here.

    Login or Signup to reply.
  2. This should be more efficient than a not exists() strategy:

    SELECT order_id, order_date, customer_id
    FROM (
        SELECT *,
           LAG(order_date) over (PARTITION BY customer_id ORDER BY order_date) as prior_date
        FROM orders
    ) o
    WHERE coalesce(DATEDIFF(order_date, prior_date),0) <> 1
    

    See it here:

    https://dbfiddle.uk/XYBIEPAN

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search