Find new customer_ids for each day in an orders table. This means I want to compare each day with previous days to find the ones existing in that day but not in earlier day.
Table:
order_id | order_date | customer_id |
---|---|---|
1 | 2023-03-27 | 1 |
2 | 2023-03-27 | 2 |
3 | 2023-03-27 | 3 |
4 | 2023-03-27 | 4 |
5 | 2023-03-27 | 5 |
6 | 2023-03-28 | 1 |
7 | 2023-03-28 | 2 |
8 | 2023-03-28 | 6 |
9 | 2023-03-28 | 7 |
10 | 2023-03-28 | 8 |
11 | 2023-03-29 | 7 |
12 | 2023-03-29 | 8 |
13 | 2023-03-29 | 9 |
14 | 2023-03-29 | 10 |
15 | 2023-03-29 | 11 |
16 | 2023-03-30 | 8 |
17 | 2023-03-30 | 9 |
18 | 2023-03-30 | 12 |
19 | 2023-03-30 | 13 |
20 | 2023-03-30 | 14 |
SQL TABLE SCHEMA:
CREATE TABLE orders (
order_id INT NOT NULL,
order_date DATE NOT NULL,
customer_id INT NOT NULL,
PRIMARY KEY (order_id)
);
INSERT INTO orders (order_id, order_date, customer_id)
VALUES
(1, '2023-03-27', 1),
(2, '2023-03-27', 2),
(3, '2023-03-27', 3),
(4, '2023-03-27', 4),
(5, '2023-03-27', 5),
(6, '2023-03-28', 1),
(7, '2023-03-28', 2),
(8, '2023-03-28', 6),
(9, '2023-03-28', 7),
(10, '2023-03-28', 8),
(11, '2023-03-29', 7),
(12, '2023-03-29', 8),
(13, '2023-03-29', 9),
(14, '2023-03-29', 10),
(15, '2023-03-29', 11),
(16, '2023-03-30', 8),
(17, '2023-03-30', 9),
(18, '2023-03-30', 12),
(19, '2023-03-30', 13),
(20, '2023-03-30', 14);
Note: Could someone please explain this step by step. Thank you in advance!
Just pretty confused with the logic I should follow for the question.
UPDATE 1:
I ran the query below and got the desired results.
select * from orders curr
where not exists (
select 1
from orders prev_day
where prev_day.order_date = curr.order_date-1
and prev_day.customer_id = curr.customer_id
)
I am wondering whether there is also a way to get the same output by a more explicit self-join for a beginner? I tried running only the sub-query and got an error and I am not sure why
2
Answers
Your query:
For every row of table it’s selecting
1
(just to check if something exists) from the same table, with samecustomer_id
, butorder_date
1 less then current.If sub-query finds something, current row drops from dataset. If it finds nothing (hence
NOT EXISTS
) current row is added to result dataset.Fiddle here.
This should be more efficient than a
not exists()
strategy:See it here: