Consider this [v good] example from Using recursive CTE to generate a range of data
WITH RECURSIVE date_range AS (
SELECT '2023-01-01'::timestamp AS date
UNION ALL
SELECT date + interval '1 day'
FROM date_range
WHERE date < '2023-12-31'
)
SELECT *
FROM date_range;
I can understand how to create this query, but do not understand the underlying logic/rules executed by the database engine. Specifically: for the select date + 'interval 1 day' from date_range
: how does it know to only "consider" the very last added record [instead of taking all the previously added records and adding 1 to each of them]?
To illustrate more clearly: consider when we are at 2023-01-03
in the recursion. In that case we already presumably have 2023-01-01
, 2023-01-02
, and 2023-01-03
in the table. Why does the second query shown below not "find" three records and add 1 to all of those 3 records?
SELECT date + interval '1 day'
FROM date_range
WHERE date < '2023-12-31'
Instead what is happening is the engine only "finds" the very last record. 2023-01-03
and adds one to that to generate the single new record 2023-01-04
? Is that just something I take on "faith" for recursive
views?
2
Answers
From the link noted by @nbk postgresql Recursive Queries: the key step is this:
So for the query mentioned in the question the intermediate table would apparently be only the newly generated result [and not any others previously generated by the recursion].
Because there are 3 different tables.
Results
keep getting accumulated and that’s where the2023-01-01
,2023-01-02
, and2023-01-03
live – the query doesn’t target that. Instead it targetsworking
which gets wiped and replaced with contents ofintermediate
from previous iteration – at that point it only holds2023-01-03
:non-recursive term
(‘2023-01-02’)
previous values from intermediate
(‘2023-01-02’),
(‘2023-01-03’)
previous values from intermediate
(‘2023-01-02’),
(‘2023-01-03’),
(‘2023-01-04’)
Behaviour above illustrates the description from the doc:
If you think about it, the "taking all the previously added records and adding 1 to each of them" thing is exactly what it does. All added previously, as in previous iteration, not all processed since the beginning of operation. In your case, it started with a single row, so in each iteration all previously added records amount to 1. If you started with 2, you’d get 2 at a time, for as long as both match the
WHERE
condition.