skip to Main Content

Consider this [v good] example from Using recursive CTE to generate a range of data

WITH RECURSIVE date_range AS (
  SELECT '2023-01-01'::timestamp  AS date
  UNION ALL
  SELECT date + interval '1 day'
  FROM date_range
  WHERE date < '2023-12-31'
)
SELECT *
FROM date_range;

I can understand how to create this query, but do not understand the underlying logic/rules executed by the database engine. Specifically: for the select date + 'interval 1 day' from date_range: how does it know to only "consider" the very last added record [instead of taking all the previously added records and adding 1 to each of them]?

To illustrate more clearly: consider when we are at 2023-01-03 in the recursion. In that case we already presumably have 2023-01-01, 2023-01-02, and 2023-01-03 in the table. Why does the second query shown below not "find" three records and add 1 to all of those 3 records?

  SELECT date + interval '1 day'
  FROM date_range
  WHERE date < '2023-12-31'

Instead what is happening is the engine only "finds" the very last record. 2023-01-03 and adds one to that to generate the single new record 2023-01-04? Is that just something I take on "faith" for recursive views?

2

Answers


  1. Chosen as BEST ANSWER

    From the link noted by @nbk postgresql Recursive Queries: the key step is this:

    Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.

    So for the query mentioned in the question the intermediate table would apparently be only the newly generated result [and not any others previously generated by the recursion].


  2. consider when we are at 2023-01-03 in the recursion. In that case we already presumably have 2023-01-01, 2023-01-02, and 2023-01-03 in the table. Why does the second query shown below not "find" three records and add 1 to all of those 3 records

    Because there are 3 different tables. Results keep getting accumulated and that’s where the 2023-01-01, 2023-01-02, and 2023-01-03 live – the query doesn’t target that. Instead it targets working which gets wiped and replaced with contents of intermediate from previous iteration – at that point it only holds 2023-01-03:

    iteration working table intermediate table results
    1 (‘2023-01-01’)
    non-recursive term
    (‘2023-01-01’+interval ‘1 day’) (‘2023-01-01’),
    (‘2023-01-02’)
    2 (‘2023-01-02’)
    previous values from intermediate
    (‘2023-01-02’+interval ‘1 day’) (‘2023-01-01’),
    (‘2023-01-02’),
    (‘2023-01-03’)
    3 (‘2023-01-03’)
    previous values from intermediate
    (‘2023-01-03’+interval ‘1 day’) (‘2023-01-01’),
    (‘2023-01-02’),
    (‘2023-01-03’),
    (‘2023-01-04’)

    Behaviour above illustrates the description from the doc:

    1. Evaluate the non-recursive term. (…) Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.

    2. So long as the working table is not empty, repeat these steps:

      a) Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. (…) Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.

      b) Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table


    If you think about it, the "taking all the previously added records and adding 1 to each of them" thing is exactly what it does. All added previously, as in previous iteration, not all processed since the beginning of operation. In your case, it started with a single row, so in each iteration all previously added records amount to 1. If you started with 2, you’d get 2 at a time, for as long as both match the WHERE condition.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search