Postgresql - Time series with running sum of latest orders per tenant

dhrm
September 18, 2023
274 views
4 votes
4 Answers

I’ve a Postgres table like this where a order count for each tenant is inserted when it changes.

datetime	tenant_id	orders
2023-09-15 22:00	tenant3	2
2023-09-16 01:00	tenant1	2
2023-09-16 02:00	tenant1	3
2023-09-16 02:00	tenant2	5
2023-09-16 03:00	tenant1	4

Note that the first row for tenant3 is from yesterday. The number of tenants is dynamic.

Is it possible, based on this data, to build a "time series"-like result where I sum of the newest orders count for all tenant, in a result like this:

datetime	sum
2023-09-16 00:00	2
2023-09-16 01:00	4
2023-09-16 02:00	10
2023-09-16 03:00	11
2023-09-16 04:00	11
…	…
2023-09-16 23:00	…

Answers

- tpdi
- September 16, 2023 at 10:33 am
- 0 votes
0
```
Select datetime, sum(orders) 
From your_table
Group by datetime
Order by 1;
```
If you also want to include dates without orders, there are a few ways to do this. The way that will work on any rbdms is to join to a utility table that consists only of one column with one row per integer, starting from zero.
```
Select dateadd(day, b.value, (select min(datetime) from your_table)), sum(orders)
From your_table a 
right outer join utility b 
on ( a.datetime = dateadd(day, b.value, (select min(datetime) from your_table)))
Group by dateadd(day, b.value, (select min(datetime) from your_table))
Order by 1;
```
To create utility:
```
 Create table utility (value int not null);

 Insert into utility values (0);

 Insert into utility 
 select value + 1 +(select max(value) from utility) 
 from utility;
```
And repeat the last line until utility has enough records to cover all the days you need.
Login or Signup to reply.

WITH d AS (
  SELECT distinct datetime FROM mytable
)
SELECT
  d.datetime,
  d1.orders as "tenant1",
  d2.orders as "tenant2",
  d3.orders as "tenant3",
  d4.orders as "tenant4",
  COALESCE(d1.orders,0) + COALESCE(d2.orders,0) + COALESCE(d3.orders,0) + COALESCE(d4.orders,0) as "sum"
FROM d
LEFT JOIN mytable d1 on d1.datetime = d.datetime AND d1.tenant_id = 'tenant1'
LEFT JOIN mytable d2 on d2.datetime = d.datetime AND d2.tenant_id = 'tenant2'
LEFT JOIN mytable d3 on d3.datetime = d.datetime AND d3.tenant_id = 'tenant3'
LEFT JOIN mytable d4 on d4.datetime = d.datetime AND d4.tenant_id = 'tenant4'
ORDER BY d.datetime

output:

datetime	tenant1	tenant2	tenant3	tenant4	sum
2023-09-15 22:00:00	null	null	2	null	2
2023-09-16 01:00:00	2	null	null	null	2
2023-09-16 02:00:00	3	5	null	null	8
2023-09-16 03:00:00	4	null	null	null	4

see: DBFIDDLE

EDIT: Because there are a dynamic number of tenants.

WITH d AS (
  SELECT distinct datetime from mytable
),
t as (
  select distinct tenant_id from mytable
)
SELECT
  datetime,
  SUM(orders)
FROM (
  SELECT
    d.datetime,
    t.tenant_id,
    m1.orders
  FROM d
  CROSS JOIN  t
  LEFT JOIN mytable m1 on m1.datetime = d.datetime and m1.tenant_id = t.tenant_id
) x
GROUP BY datetime
ORDER BY datetime

output:

datetime	sum
2023-09-15 22:00:00	2
2023-09-16 01:00:00	2
2023-09-16 02:00:00	8
2023-09-16 03:00:00	4

see: DBFIDDLE

EDIT2: just for fun, as bonus:

adding '(' || STRING_AGG(orders::varchar,'+' ORDER BY tenant_id ASC) || ')' as bonus

see: DBFIDDLE

EDIT3: added ORDER BY tenant_id ASC to STRING_AGG() to make sure the calculation is in the order of tenants that take part in the sum.

will result in:

datetime	sum	bonus
2023-09-15 22:00:00	2	(2)
2023-09-16 01:00:00	2	(2)
2023-09-16 02:00:00	8	(3+5)
2023-09-16 03:00:00	4	(4)

- Patrick
- September 16, 2023 at 12:25 pm
- 0 votes
0
```
SELECT dt AS datetime, sum(orders)
FROM (SELECT dt, max(datetime) AS datetime, tenant_id
      FROM generate_series(CURRENT_DATE::timestamp, CURRENT_DATE::timestamp + interval '23 hours', '1 hour') gs(dt)
      LEFT JOIN your_table t ON (t.datetime <= gs.dt)
      GROUP BY 1, 3) x
JOIN (SELECT datetime, tenant_id, orders FROM your_table) y USING (datetime, tenant_id)
GROUP BY 1
ORDER BY 1
```
Subquery x generates the time series consisting of all hours of the current day and then joins that to all the rows in your table that are earlier in time (t.datetime <= gs.dt) and then finds the most recent row for each combination of tenant_id and the time series period. In subquery y the orders for each combination of most recent datetime and tenant_id are found. In the main query the two subqueries are joined and then the sum of the orders is calculated.

This produces exactly the result you were asking for, but I had to assume a couple of things, such as that you wanted output for 24 hours of the current day. If you want to change that you should tweak the generate_series() call. You can also further restrict the join condition in subquery x to only count orders from the current month or whatever meets your need.
Login or Signup to reply.

- ErwinBrandstetter
- September 16, 2023 at 5:10 pm
- 0 votes
0
```
SELECT to_char(datetime, 'YYYY-MM-DD HH24:MI') AS today_hour  -- optional pretty print
     , sum(hour_sum) OVER (ORDER BY GREATEST(datetime, '2023-09-16 00:00')) AS sum
FROM   generate_series(timestamp '2023-09-16'        -- CURRENT_DATE::timestamp
                     , timestamp '2023-09-16 23:00'  -- CURRENT_DATE::timestamp + interval '23 hours'
                     , interval  '1 hour') AS g(datetime)
LEFT   JOIN (      
   SELECT GREATEST(datetime, '2023-09-16 00:00') AS datetime  -- CURRENT_DATE
        , sum(delta) AS hour_sum
   FROM  (
      SELECT datetime
           , orders - lag(orders, 1, 0) OVER (PARTITION BY tenant_id ORDER BY datetime) AS delta
      FROM   tbl
      ) sub1
   GROUP  BY 1
   ) sub2 USING (datetime)
ORDER  BY datetime;
```
fiddle

In your daily query, replace the date / timestamp literals with the commented expressions.

Step 1: subquery sub1

Get the delta of orders for each tenant between the current row and the previous one (if any). The window function call lag(orders, 1, 0) is instrumental for this. See:
- Find all rows in between a set in PostgreSQL
Step 2: subquery sub2 / time-series g

Aggregate delta values per hour, starting with today 00:00 in sub2.
In parallel, generate one row for every full hour of the day in g. See:
- Generating time series between two dates in PostgreSQL
Step 3

LEFT JOIN to preserve exactly one output row per hour, then compute a running sum with the window function sum(hour_sum)

Alternative query for comparison

The idea behind Patrick’s query can be implemented more efficiently with DISTINCT ON:
```
SELECT datetime, sum(orders)
FROM (
   SELECT DISTINCT ON (g.datetime, tenant_id)
          g.datetime, orders
   FROM   generate_series(timestamp '2023-09-16', timestamp '2023-09-16 23:00', '1 hour') g(datetime)
   LEFT   JOIN tbl t ON t.datetime <= g.datetime
   ORDER  BY g.datetime, tenant_id, t.datetime DESC NULLS LAST
   ) x
GROUP BY 1
ORDER BY 1;
```
See:
- Select first row in each GROUP BY group?
Much simpler and faster. But not as fast as my first query, which scales much better for more tenants and/or more entries per tenant.

fiddle
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Time series with running sum of latest orders per tenant

Answers

Step 1: subquery `sub1`

Step 2: subquery `sub2` / time-series `g`

Step 3

Alternative query for comparison

Postgresql – Time series with running sum of latest orders per tenant

Answers

Step 1: subquery sub1

Step 2: subquery sub2 / time-series g

Step 3

Alternative query for comparison

Step 1: subquery `sub1`

Step 2: subquery `sub2` / time-series `g`