skip to Main Content

I’ve a Postgres table like this where a order count for each tenant is inserted when it changes.

datetime tenant_id orders
2023-09-15 22:00 tenant3 2
2023-09-16 01:00 tenant1 2
2023-09-16 02:00 tenant1 3
2023-09-16 02:00 tenant2 5
2023-09-16 03:00 tenant1 4

Note that the first row for tenant3 is from yesterday. The number of tenants is dynamic.

Is it possible, based on this data, to build a "time series"-like result where I sum of the newest orders count for all tenant, in a result like this:

datetime sum
2023-09-16 00:00 2
2023-09-16 01:00 4
2023-09-16 02:00 10
2023-09-16 03:00 11
2023-09-16 04:00 11
2023-09-16 23:00

4

Answers


  1. Select datetime, sum(orders) 
    From your_table
    Group by datetime
    Order by 1;
    

    If you also want to include dates without orders, there are a few ways to do this. The way that will work on any rbdms is to join to a utility table that consists only of one column with one row per integer, starting from zero.

    Select dateadd(day, b.value, (select min(datetime) from your_table)), sum(orders)
    From your_table a 
    right outer join utility b 
    on ( a.datetime = dateadd(day, b.value, (select min(datetime) from your_table)))
    Group by dateadd(day, b.value, (select min(datetime) from your_table))
    Order by 1;
    

    To create utility:

     Create table utility (value int not null);
    
     Insert into utility values (0);
    
     Insert into utility 
     select value + 1 +(select max(value) from utility) 
     from utility;
    

    And repeat the last line until utility has enough records to cover all the days you need.

    Login or Signup to reply.
  2. WITH d AS (
      SELECT distinct datetime FROM mytable
    )
    SELECT
      d.datetime,
      d1.orders as "tenant1",
      d2.orders as "tenant2",
      d3.orders as "tenant3",
      d4.orders as "tenant4",
      COALESCE(d1.orders,0) + COALESCE(d2.orders,0) + COALESCE(d3.orders,0) + COALESCE(d4.orders,0) as "sum"
    FROM d
    LEFT JOIN mytable d1 on d1.datetime = d.datetime AND d1.tenant_id = 'tenant1'
    LEFT JOIN mytable d2 on d2.datetime = d.datetime AND d2.tenant_id = 'tenant2'
    LEFT JOIN mytable d3 on d3.datetime = d.datetime AND d3.tenant_id = 'tenant3'
    LEFT JOIN mytable d4 on d4.datetime = d.datetime AND d4.tenant_id = 'tenant4'
    ORDER BY d.datetime
    

    output:

    datetime tenant1 tenant2 tenant3 tenant4 sum
    2023-09-15 22:00:00 null null 2 null 2
    2023-09-16 01:00:00 2 null null null 2
    2023-09-16 02:00:00 3 5 null null 8
    2023-09-16 03:00:00 4 null null null 4

    see: DBFIDDLE

    EDIT: Because there are a dynamic number of tenants.

    WITH d AS (
      SELECT distinct datetime from mytable
    ),
    t as (
      select distinct tenant_id from mytable
    )
    SELECT
      datetime,
      SUM(orders)
    FROM (
      SELECT
        d.datetime,
        t.tenant_id,
        m1.orders
      FROM d
      CROSS JOIN  t
      LEFT JOIN mytable m1 on m1.datetime = d.datetime and m1.tenant_id = t.tenant_id
    ) x
    GROUP BY datetime
    ORDER BY datetime
    

    output:

    datetime sum
    2023-09-15 22:00:00 2
    2023-09-16 01:00:00 2
    2023-09-16 02:00:00 8
    2023-09-16 03:00:00 4

    see: DBFIDDLE

    EDIT2: just for fun, as bonus:

    adding '(' || STRING_AGG(orders::varchar,'+' ORDER BY tenant_id ASC) || ')' as bonus

    see: DBFIDDLE

    EDIT3: added ORDER BY tenant_id ASC to STRING_AGG() to make sure the calculation is in the order of tenants that take part in the sum.

    will result in:

    datetime sum bonus
    2023-09-15 22:00:00 2 (2)
    2023-09-16 01:00:00 2 (2)
    2023-09-16 02:00:00 8 (3+5)
    2023-09-16 03:00:00 4 (4)
    Login or Signup to reply.
  3. SELECT dt AS datetime, sum(orders)
    FROM (SELECT dt, max(datetime) AS datetime, tenant_id
          FROM generate_series(CURRENT_DATE::timestamp, CURRENT_DATE::timestamp + interval '23 hours', '1 hour') gs(dt)
          LEFT JOIN your_table t ON (t.datetime <= gs.dt)
          GROUP BY 1, 3) x
    JOIN (SELECT datetime, tenant_id, orders FROM your_table) y USING (datetime, tenant_id)
    GROUP BY 1
    ORDER BY 1
    

    Subquery x generates the time series consisting of all hours of the current day and then joins that to all the rows in your table that are earlier in time (t.datetime <= gs.dt) and then finds the most recent row for each combination of tenant_id and the time series period. In subquery y the orders for each combination of most recent datetime and tenant_id are found. In the main query the two subqueries are joined and then the sum of the orders is calculated.

    This produces exactly the result you were asking for, but I had to assume a couple of things, such as that you wanted output for 24 hours of the current day. If you want to change that you should tweak the generate_series() call. You can also further restrict the join condition in subquery x to only count orders from the current month or whatever meets your need.

    Login or Signup to reply.
  4. SELECT to_char(datetime, 'YYYY-MM-DD HH24:MI') AS today_hour  -- optional pretty print
         , sum(hour_sum) OVER (ORDER BY GREATEST(datetime, '2023-09-16 00:00')) AS sum
    FROM   generate_series(timestamp '2023-09-16'        -- CURRENT_DATE::timestamp
                         , timestamp '2023-09-16 23:00'  -- CURRENT_DATE::timestamp + interval '23 hours'
                         , interval  '1 hour') AS g(datetime)
    LEFT   JOIN (      
       SELECT GREATEST(datetime, '2023-09-16 00:00') AS datetime  -- CURRENT_DATE
            , sum(delta) AS hour_sum
       FROM  (
          SELECT datetime
               , orders - lag(orders, 1, 0) OVER (PARTITION BY tenant_id ORDER BY datetime) AS delta
          FROM   tbl
          ) sub1
       GROUP  BY 1
       ) sub2 USING (datetime)
    ORDER  BY datetime;
    

    fiddle

    In your daily query, replace the date / timestamp literals with the commented expressions.

    Step 1: subquery sub1

    Get the delta of orders for each tenant between the current row and the previous one (if any). The window function call lag(orders, 1, 0) is instrumental for this. See:

    Step 2: subquery sub2 / time-series g

    Aggregate delta values per hour, starting with today 00:00 in sub2.
    In parallel, generate one row for every full hour of the day in g. See:

    Step 3

    LEFT JOIN to preserve exactly one output row per hour, then compute a running sum with the window function sum(hour_sum)

    Alternative query for comparison

    The idea behind Patrick’s query can be implemented more efficiently with DISTINCT ON:

    SELECT datetime, sum(orders)
    FROM (
       SELECT DISTINCT ON (g.datetime, tenant_id)
              g.datetime, orders
       FROM   generate_series(timestamp '2023-09-16', timestamp '2023-09-16 23:00', '1 hour') g(datetime)
       LEFT   JOIN tbl t ON t.datetime <= g.datetime
       ORDER  BY g.datetime, tenant_id, t.datetime DESC NULLS LAST
       ) x
    GROUP BY 1
    ORDER BY 1;
    

    See:

    Much simpler and faster. But not as fast as my first query, which scales much better for more tenants and/or more entries per tenant.

    fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search