skip to Main Content

There is a database with online store customers, which has the following data:

  • id buyer’s;
  • order number;
  • date and time of the order;
  • order amount.

Data on customers and orders for 3 months.

How to withdraw customers who made more than 1 order per day and in the range of 3 hours of any day.

Is it possible to do this with a single query, or is it necessary to check each three-hour range manually?

For example, the buyer id 10321630 made 5 orders, on May 26, in the period from 19:00 to 22:00 (19:15, 19:37, 19:51, 21:07, 21:18) It is necessary that it be in the output of the query result.

I found tips on the Internet only about "between", but this is not exactly the case

3

Answers


  1. BETWEEN is the case. You can apply it to the ::time of what I’m guessing is your purchase timestamp: demo at db<>fiddle

    select buyer_id
         , placed_at::date as placed_date
         , count(*) as number_of_orders
         , array_agg(placed_at::time(0)) as order_times
    from orders 
    where placed_at::time between '19:00' 
                              and '22:00' 
    group by 1,2 
    having count(*)>1;
    
    buyer_id placed_date number_of_orders order_times
    10321630 2024-05-26 5 {19:15:00,21:18:00,21:07:00,19:51:00,19:37:00}
    10321633 2024-06-15 2 {19:29:19,20:07:55}

    If you mean any continuous 3-hour period within any one day, not the specific times between 19:00 and 20:00 from your example, it’s a bit trickier: you’ll need a stepping/rolling/tumbling count, which means using count(*) as a window function to restrict it to counting things up to 3 hours prior to any row.

    select distinct on(1,2,order_times[1]) * from (
    select buyer_id
         , placed_at::date as placed_date
         , count(*)over w1 as number_of_orders
         , array_agg(placed_at::time(0))over w1 as order_times
    from orders 
    window w1 as (partition by buyer_id,placed_at::date
                  order by placed_at::time
                  range between '3 hours' preceding
                            and current row))_
    where number_of_orders>1
    order by 1,2,order_times[1],number_of_orders desc;
    
    buyer_id placed_date number_of_orders order_times
    10321630 2024-05-26 5 {19:15:00,19:37:00,19:51:00,21:07:00,21:18:00}
    10321633 2024-06-15 2 {19:29:19,20:07:55}
    10321630 2024-06-10 2 {10:06:42,11:35:34}
    10321630 2024-06-11 2 {13:48:25,16:42:21}

    Problem with that is you might get overlaps, some of which you can get rid of using the distinct on above, to only keep the maximum period spanning from a common point, instead of listing all its sub-periods.
    If you have one order at midnight, another one at 2:59 and another one at 4:58, the middle one will show up once paired up with midnight and another time paired up with 4:58.

    Login or Signup to reply.
  2. this query will return what you want but I wrote it for MYSQL

    select buyer_id, count(id), group_concat(DATE_FORMAT(order_date, '%H:%i:%s'))
    from orders
    where HOUR(order_date) BETWEEN 19 AND 22
    group by buyer_id, DATE(order_date)
    having count(*) > 1
    

    the idea is grouping by buyer_id and the day of the order_date(without time)

    Login or Signup to reply.
  3. with dataX as (
        select *,
            lag(OrderTimestamp) over (partition by BuyerId order by OrderTimestamp) as lastOrderTimestamp
        from T
    )
    select distinct BuyerId
    from dataX
    where OrderTimestamp + interval '-3 hours' < lastOrderTimestamp;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search