Postgresql - How to avoid code duplication on a condition that repeats in many different queries

JanKlan
July 9, 2024
185 views
3 votes
3 Answers

Several (20+) tables in my database define columns current_since::timestamptz and current_until::timestamptz. Each row in each of those tables follows the same rules:

If current_since is null, the row is considered "draft".
If current_sice is not null and current_until is null, the row is considered "current after current_since moment, until forever".
If current_since and current_until is set, then the row is considered to be current between the two moments.

Each table uses a btree index on (current_since, current_until)

There are many different SQL queries where I want to ensure I only load rows current at a particular time (usually an arbitrary user input). For that, a condition might look like this:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM record
WHERE
    :timestamp BETWEEN record.current_since AND record.current_until
    OR (
        record.current_since <= :timestamp
        AND record.current_until IS NULL
    )

The execution plan (using SET enable_seqscan = OFF; and NOW() as the :timestamp value) is as follows:

Bitmap Heap Scan on record  (cost=23.97..42.32 rows=374 width=300) (actual time=0.056..0.246 rows=374 loops=1)
  Recheck Cond: (((now() >= current_since) AND (now() <= current_until)) OR ((current_since <= now()) AND (current_until IS NULL)))
  Filter: (((now() >= current_since) AND (now() <= current_until)) OR ((current_since <= now()) AND (current_until IS NULL)))
  Heap Blocks: exact=9
  Buffers: shared hit=11
  ->  BitmapOr  (cost=23.97..23.97 rows=374 width=0) (actual time=0.040..0.041 rows=0 loops=1)
        Buffers: shared hit=2
        ->  Bitmap Index Scan on idx_current_at  (cost=0.00..11.89 rows=1 width=0) (actual time=0.005..0.005 rows=0 loops=1)
              Index Cond: ((current_since <= now()) AND (current_until >= now()))
              Buffers: shared hit=1
        ->  Bitmap Index Scan on idx_current_at  (cost=0.00..11.89 rows=374 width=0) (actual time=0.034..0.034 rows=374 loops=1)
              Index Cond: ((current_since <= now()) AND (current_until IS NULL))
              Buffers: shared hit=1
Planning:
  Buffers: shared hit=4
Planning Time: 0.194 ms
Execution Time: 0.324 ms

It gets bad quickly when multiple of said tables are part of a larger join query:

SELECT * FROM record
    LEFT JOIN comment
        ON comment.record_id = record.id
        AND (
           :timestamp BETWEEN comment.current_since AND comment.current_until
            OR (
                comment.current_since <= :timestamp
                AND comment.current_until IS NULL
            )    
        )
    LEFT JOIN attachment
        ON attachment.comment_id = comment.id
        AND (
            :timestamp BETWEEN attachment.current_since AND attachment.current_until
            OR (
                attachment.current_since <= :timestamp
                AND attachment.current_until IS NULL
            )    
        )
WHERE
    :timestamp BETWEEN record.current_since AND record.current_until
    OR (
        record.current_since <= :timestamp
        AND record.current_until IS NULL
    )

^ that’s a lot of "clutter" IMO.

My question is: How can I avoid writing these fairly complex conditions all the time manually.

If I ever decide to change the logic, I will have to update many queries manually. Yet, I would rather do that than sacrifice performance.

Keep in mind the timestamp isn’t a fixed NOW() value; often an user provides a custom timestamp to fetch data current at a particular point in time in the past.

I have tried to write a stored function returning a boolean:

CREATE FUNCTION is_current(
    at TIMESTAMP WITH TIME ZONE,
    current_since TIMESTAMP WITH TIME ZONE,
    current_until TIMESTAMP WITH TIME ZONE
)
    RETURNS BOOLEAN
    IMMUTABLE
    LANGUAGE plpgsql
AS
$$
BEGIN
    IF current_since IS NULL THEN
        RETURN FALSE;
    END IF;

    IF current_since <= at AND current_until IS NULL THEN
        RETURN TRUE;
    END IF;

    RETURN at BETWEEN current_since AND current_until;
END
$$;

Using it instead of the condition

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM record WHERE
    is_current(:timestamp, record.current_since, record.current_until)

But it was significantly slower, as I expected:

Seq Scan on record  (cost=10000000000.00..10000000107.17 rows=125 width=300) (actual time=27.852..28.052 rows=374 loops=1)
  Filter: is_current(now(), current_since, current_until)
  Buffers: shared hit=9
Planning Time: 0.117 ms
JIT:
  Functions: 2
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 0.400 ms, Inlining 3.039 ms, Optimization 16.067 ms, Emission 8.698 ms, Total 28.204 ms
Execution Time: 28.509 ms

Bonus question: is it so much slower because using a function like this prevents the query executor from accessing the columns directly, forcing it constantly to call the function and then somehow work with the boolean result?

Tags: code-duplication postgresql

Answers

- FrankHeikens
- July 9, 2024 at 2:09 am
- 0 votes
0
Create a view for each table and put all your basic conditions in these views. Just query these views for the rest of your live and never have to add the basic conditions again.

That’s also why I try to avoid tables in the public schema, only views and functions for public access will be there. No data.

Login or Signup to reply.

- BrankoRadovanovic
- July 9, 2024 at 1:28 pm
- 0 votes
0
If you are able to change the schema, the best solution would be to drop current_since and current_until and use e.g. current_range, of type tstzrange, and populate it with:
```
case when current_since is not null then tstzrange(current_since, current_until, '[]') end
```
Selecting current records then becomes simply:
```
SELECT * FROM record
WHERE :timestamp <@ current_range
```
In this setup, the <@ operator effectively becomes the same as using BETWEEN, and we don’t need to test for "draft" rows – they won’t be selected, because for these rows current_range would be NULL.

If you cannot drop these columns, but you can add new ones, adding a generated column might do the trick:
```
alter table record add column current_range tstzrange generated always as
(case when current_since is not null then tstzrange(current_since, current_until, '[]') end) stored
```
In this way, one can continue to use the original timestamptz columns, but still rely on the new (effectively read-only) range column for filtering,

If changing the schema is not an option at all, then putting the above column in a view will effectively achieve the same. In this case, however, the performance will be bad, because queries using <@ on the virtual current_range column won’t be able to use the index on {current_since, current_until}. There is a solution for that too, though, a functional index:
```
create index record_current_range_ix on record using 
gist((case when current_since is not null then tstzrange(current_since, current_until, '[]') end))
```
GIST indexes on the tstzrange column – whether functional or not – should be, in fact, significantly more efficient for filtering the rows than the current composite btree index.
Login or Signup to reply.

Your "fairly complex conditions" could be a bit simpler:

:timestamp Between r.current_since And Coalesce(r.current_untill, '2099-12-31 23:59:59')

… and could all be moved from ON() clauses to the Where clause.
I’m affraid that (in this scenario) you should use all the filters anyway but below is the code where all of them are together.

Select      *
From        record r
-- Inner Join user input here / below 
Inner Join ( Select :timestamp as ts_input ) tsi ON( 1 = 1 )
Left Join  comment c ON(c.record_id = r.id)
Left Join  attachment a ON(a.comment_id = c.id)
--  move simplified ON conditions to Where clause / below
Where      tsi.ts_input Between r.current_since And Coalesce(r.current_untill, '2099-12-31 23:59:59') AND
           tsi.ts_input Between c.current_since And Coalesce(c.current_untill, '2099-12-31 23:59:59') AND 
           tsi.ts_input Between a.current_since And Coalesce(a.current_untill, '2099-12-31 23:59:59')

See the sample with results dbfiddle here.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – How to avoid code duplication on a condition that repeats in many different queries

Answers