I thought I understood GENERATE_SERIES(), but obviously not as well as I thought.
I have a table:
CREATE TABLE status
(
start_tz TIMESTAMPTZ NOT NULL,
end_tz TIMESTAMPTZ NOT NULL,
ms TEXT NOT NULL,
sensor_id SMALLINT NOT NULL,
);
And I wish to fill this table with simulated data from 2022-01-01 00:00:00
in 20 minute intervals for 4 sensors (numbered 1 – 4) for 2 hours (or whatever – no hardcoding). The ms
text is random – see below.
I have tried various combinations of
SELECT
GENERATE_SERIES('2022-01-01 00:00:00', '2022-01-01 02:00:00', (INTERVAL '20 MINUTES')),
GENERATE_SERIES('2022-01-01 00:20:00', '2022-01-01 02:20:00', (INTERVAL '20 MINUTES')),
CASE
WHEN random() > 0.5 THEN 'in_motion)'
ELSE 'stationary'
END,
GENERATE_SERIES(1, 4);
and just can’t get it. I also have various cominations and permutations of FROM GENERATE_SERIES(1, ... whatever guess...);
and I’m stumped.
What I want looks like this:
start_tz end_tz ms sensor_id
2022-01-01 00:00:00+00 2022-01-01 00:20:00+00 stationary 1
2022-01-01 00:20:00+00 2022-01-01 00:40:00+00 stationary 1
2022-01-01 00:40:00+00 2022-01-01 01:00:00+00 in_motion) 1
2022-01-01 01:00:00+00 2022-01-01 01:20:00+00 in_motion) 1
2022-01-01 01:20:00+00 2022-01-01 01:40:00+00 stationary 1
2022-01-01 01:40:00+00 2022-01-01 02:00:00+00 in_motion) 1
2022-01-01 02:00:00+00 2022-01-01 02:20:00+00 stationary 1
2022-01-01 00:00:00+00 2022-01-01 00:20:00+00 stationary 2
2022-01-01 00:20:00+00 2022-01-01 00:40:00+00 stationary 2
2022-01-01 00:40:00+00 2022-01-01 01:00:00+00 in_motion) 2
...
...
and so on
A fiddle is available here. The correct answer will give some explanation of where I was going wrong – I don’t want to be given a fish, I would like to be taught how to do a little fishing of my own!
2
Answers
The behavior you want would have been produced in 9.6, but perhaps only by accident. Back then multiple SRF in the select list would be cycled until all of them were exhausted at the same time, returning a number of rows which was the Least Common Multiple of each isolated row counts. Because 7 is prime, this would do what you apparently want. This was changed in v10, as a known side effect of other refactoring I think, so that each SRF was cycle just once with the "short" ones being augmented with NULLs.
To get what you want you can put one of the SRF into the FROM list:
But I think it is still discouraged to put SRF in the SELECT-list. The problem is that putting both of the time ones in the FROM-list would return 7*7*4 rows, not LCM(7,7,4) rows. To get around that, you would reduce it to one SRF for the time, and use an expression to get the parallel offset time point.
So this is the way I would do it. It is also less repetitive, repeating only the interval rather than the offsat start, offsat stop, and interval.
GENERATE_SERIES
returns a table, so to get rows from it you should only use it in theFROM
clause of a query, not theSELECT
part."generate_series"
– because that’s also the name of the function itself it’s a good idea to escape the column name (with double-quotes) to avoid potential trouble.end_tz
column values are always 20-minutes ahead ofstart_tz
you don’t (and shouldn’t) useGENERATE_SERIES
again, instead define a new query column that simply adds 20 minutes to the already-generatedstart_tz
value.sensor_id
can be done by re-querying that generated data 4 times over withUNION
.'in_motion'
/stationary'
column in the last query step.Like so:
Here’s the DBFiddle, screenshot below: