skip to Main Content

I am trying to get the ids of the customers who have placed orders on consecutive days. The table is created as follows:

create table orders(
  orderid INT,
  orderdate date,
  customerid int
);

The values:

insert into orders (orderid, orderdate, customerid)
values(1,'2023-06-20',1),
    (2, '2023-06-21', 2),
    (3, '2023-06-22', 3),
    (4, '2023-06-22', 1),
    (5, '2023-06-23', 3),
    (6, '2023-06-22', 1),
    (7, '2023-06-26', 4),
    (8, '2023-06-27', 4),
    (9, '2023-06-29', 4),
    (10, '2023-06-29', 5),
    (11, '2023-06-30', 5),
    (12, '2023-06-28', 5),
    (13, '2023-06-25', 4),
    (14, '2023-06-24', 4),
    (15, '2023-06-30', 4);

The code that I wrote gave the output if ids with orders consecutive days but left the ids of customers who had a gap in their order despite having more number of orders before the gap actually occurred.
The code that I wrote:

with t1 as(
select customerid, orderdate,
case when lead(orderdate) over (partition by customerid order by orderdate) is null then 1
else abs(orderdate - lead(orderdate) over (partition by customerid order by orderdate)) end as gap
from orders)
select customerid, sum(gap) as consecutive
from t1
where gap>0
group by customerid
having count(*)=sum(gap) and count(*)>1;

The output:

+------------+------------------+
| customerid | consecutive_days |
+------------+------------------+
|          3 |                2 |
|          5 |                3 |
+------------+------------------+

The output I would want:

+------------+------------------+
| customerid | consecutive_days |
+------------+------------------+
|          3 |                2 |
|          4 |                4 |
|          4 |                2 |
|          5 |                3 |
+------------+------------------+

Since customer with customerid 4 has ordered from 2023-06-24 to 2023-06-27. The next order from the same customer is on 2023-06-29 and 2023-06-30 hence not continuous and should occur as a separate row.

Edit: The orders placed must be on consecutive days, irrespective of the number of orders placed on a single day.

3

Answers


  1. This is a typical gaps and islands problem and here is one way of getting your desired result:

    SELECT customerid, COUNT(DISTINCT orderdate) AS consecutive_days
    FROM (
        SELECT *, orderdate - INTERVAL DENSE_RANK() OVER (PARTITION BY customerid ORDER BY orderdate) DAY AS grp
        FROM orders
    ) AS order_groups
    GROUP BY customerid, grp
    HAVING consecutive_days > 1;
    

    Outputs:

    customerid consecutive_days
    3 2
    4 4
    4 2
    5 3

    Here’s a db<>fiddle.

    If you look at the result of the derived table you can see that by subtracting the DENSE_RANK from the orderdate we create a value we can use for grouping.

    orderid orderdate customerid grp
    1 2023-06-20 1 2023-06-19
    4 2023-06-22 1 2023-06-20
    6 2023-06-22 1 2023-06-20
    2 2023-06-21 2 2023-06-20
    3 2023-06-22 3 2023-06-21
    5 2023-06-23 3 2023-06-21
    14 2023-06-24 4 2023-06-23
    13 2023-06-25 4 2023-06-23
    7 2023-06-26 4 2023-06-23
    8 2023-06-27 4 2023-06-23
    9 2023-06-29 4 2023-06-24
    15 2023-06-30 4 2023-06-24
    12 2023-06-28 5 2023-06-27
    10 2023-06-29 5 2023-06-27
    11 2023-06-30 5 2023-06-27

    Note that I have used DENSE_RANK, instead of the more typical ROW_NUMBER, to handle customers placing multiple orders on the same day.

    Login or Signup to reply.
  2. This is an other solution to resolve this gaps and islands, it can be resolved by calculating the difference between row numbers (DENSE_RANK is used because various customers may place orders on the same day) , which assigns a unique identifier to each sequence of consecutive orders for a customer :

    WITH cte AS (
      SELECT
        *, DENSE_RANK() over(ORDER BY orderdate) - DENSE_RANK() over (PARTITION BY customerid ORDER BY orderdate) as grp
      FROM orders
    )
    SELECT customerid, COUNT(DISTINCT orderdate) AS consecutive_days 
    FROM cte
    GROUP BY customerid, grp
    HAVING consecutive_days > 1;
    

    Results :

    customerid  consecutive_days
    3           2
    4           4
    4           2
    5           3
    

    Demo here

    Login or Signup to reply.
  3. It really depends on yxour table size.
    If you only have a couple hundred orders, and O=N² is viable, go with something as simple as

    SELECT DISTINCT o1.customerid
    FROM orders o1
    JOIN orders o2 ON o1.customerid = o2.customerid
    AND DATEDIFF(o1.orderdate, o2.orderdate) = 1;
    

    For performance, put a single index over both customerid and orderdate.

    The best way, though somewhat complex, are the MySQL LEAD and LAG functions.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search