I’ve a Postgres table like this where a order count for each tenant is inserted when it changes.
datetime | tenant_id | orders |
---|---|---|
2023-09-15 22:00 | tenant3 | 2 |
2023-09-16 01:00 | tenant1 | 2 |
2023-09-16 02:00 | tenant1 | 3 |
2023-09-16 02:00 | tenant2 | 5 |
2023-09-16 03:00 | tenant1 | 4 |
Note that the first row for tenant3 is from yesterday. The number of tenants is dynamic.
Is it possible, based on this data, to build a "time series"-like result where I sum of the newest orders count for all tenant, in a result like this:
datetime | sum |
---|---|
2023-09-16 00:00 | 2 |
2023-09-16 01:00 | 4 |
2023-09-16 02:00 | 10 |
2023-09-16 03:00 | 11 |
2023-09-16 04:00 | 11 |
… | … |
2023-09-16 23:00 | … |
4
Answers
If you also want to include dates without orders, there are a few ways to do this. The way that will work on any rbdms is to join to a utility table that consists only of one column with one row per integer, starting from zero.
To create utility:
And repeat the last line until utility has enough records to cover all the days you need.
output:
see: DBFIDDLE
EDIT: Because there are a dynamic number of tenants.
output:
see: DBFIDDLE
EDIT2: just for fun, as bonus:
adding
'(' || STRING_AGG(orders::varchar,'+' ORDER BY tenant_id ASC) || ')' as bonus
see: DBFIDDLE
EDIT3: added
ORDER BY tenant_id ASC
toSTRING_AGG()
to make sure the calculation is in the order of tenants that take part in the sum.will result in:
Subquery
x
generates the time series consisting of all hours of the current day and then joins that to all the rows in your table that are earlier in time (t.datetime <= gs.dt
) and then finds the most recent row for each combination oftenant_id
and the time series period. In subqueryy
theorders
for each combination of most recentdatetime
andtenant_id
are found. In the main query the two subqueries are joined and then the sum of theorders
is calculated.This produces exactly the result you were asking for, but I had to assume a couple of things, such as that you wanted output for 24 hours of the current day. If you want to change that you should tweak the
generate_series()
call. You can also further restrict the join condition in subqueryx
to only count orders from the current month or whatever meets your need.fiddle
In your daily query, replace the date / timestamp literals with the commented expressions.
Step 1: subquery
sub1
Get the delta of
orders
for each tenant between the current row and the previous one (if any). The window function calllag(orders, 1, 0)
is instrumental for this. See:Step 2: subquery
sub2
/ time-seriesg
Aggregate
delta
values per hour, starting with today 00:00 insub2
.In parallel, generate one row for every full hour of the day in
g
. See:Step 3
LEFT JOIN
to preserve exactly one output row per hour, then compute a running sum with the window functionsum(hour_sum)
Alternative query for comparison
The idea behind Patrick’s query can be implemented more efficiently with
DISTINCT ON
:See:
Much simpler and faster. But not as fast as my first query, which scales much better for more tenants and/or more entries per tenant.
fiddle