I am trying to get the ids of the customers who have placed orders on consecutive days. The table is created as follows:
create table orders(
orderid INT,
orderdate date,
customerid int
);
The values:
insert into orders (orderid, orderdate, customerid)
values(1,'2023-06-20',1),
(2, '2023-06-21', 2),
(3, '2023-06-22', 3),
(4, '2023-06-22', 1),
(5, '2023-06-23', 3),
(6, '2023-06-22', 1),
(7, '2023-06-26', 4),
(8, '2023-06-27', 4),
(9, '2023-06-29', 4),
(10, '2023-06-29', 5),
(11, '2023-06-30', 5),
(12, '2023-06-28', 5),
(13, '2023-06-25', 4),
(14, '2023-06-24', 4),
(15, '2023-06-30', 4);
The code that I wrote gave the output if ids with orders consecutive days but left the ids of customers who had a gap in their order despite having more number of orders before the gap actually occurred.
The code that I wrote:
with t1 as(
select customerid, orderdate,
case when lead(orderdate) over (partition by customerid order by orderdate) is null then 1
else abs(orderdate - lead(orderdate) over (partition by customerid order by orderdate)) end as gap
from orders)
select customerid, sum(gap) as consecutive
from t1
where gap>0
group by customerid
having count(*)=sum(gap) and count(*)>1;
The output:
+------------+------------------+
| customerid | consecutive_days |
+------------+------------------+
| 3 | 2 |
| 5 | 3 |
+------------+------------------+
The output I would want:
+------------+------------------+
| customerid | consecutive_days |
+------------+------------------+
| 3 | 2 |
| 4 | 4 |
| 4 | 2 |
| 5 | 3 |
+------------+------------------+
Since customer with customerid 4 has ordered from 2023-06-24 to 2023-06-27. The next order from the same customer is on 2023-06-29 and 2023-06-30 hence not continuous and should occur as a separate row.
Edit: The orders placed must be on consecutive days, irrespective of the number of orders placed on a single day.
3
Answers
This is a typical gaps and islands problem and here is one way of getting your desired result:
Outputs:
Here’s a db<>fiddle.
If you look at the result of the derived table you can see that by subtracting the DENSE_RANK from the orderdate we create a value we can use for grouping.
Note that I have used DENSE_RANK, instead of the more typical ROW_NUMBER, to handle customers placing multiple orders on the same day.
This is an other solution to resolve this gaps and islands, it can be resolved by calculating the difference between row numbers (
DENSE_RANK
is used because various customers may place orders on the same day) , which assigns a unique identifier to each sequence of consecutive orders for a customer :Results :
Demo here
It really depends on yxour table size.
If you only have a couple hundred orders, and O=N² is viable, go with something as simple as
For performance, put a single index over both customerid and orderdate.
The best way, though somewhat complex, are the MySQL LEAD and LAG functions.