Several (20+) tables in my database define columns current_since::timestamptz
and current_until::timestamptz
. Each row in each of those tables follows the same rules:
- If
current_since
is null, the row is considered "draft". - If
current_sice
is not null andcurrent_until
is null, the row is considered "current aftercurrent_since
moment, until forever". - If
current_since
andcurrent_until
is set, then the row is considered to be current between the two moments.
Each table uses a btree index on (current_since, current_until)
There are many different SQL queries where I want to ensure I only load rows current at a particular time (usually an arbitrary user input). For that, a condition might look like this:
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM record
WHERE
:timestamp BETWEEN record.current_since AND record.current_until
OR (
record.current_since <= :timestamp
AND record.current_until IS NULL
)
The execution plan (using SET enable_seqscan = OFF;
and NOW()
as the :timestamp
value) is as follows:
Bitmap Heap Scan on record (cost=23.97..42.32 rows=374 width=300) (actual time=0.056..0.246 rows=374 loops=1)
Recheck Cond: (((now() >= current_since) AND (now() <= current_until)) OR ((current_since <= now()) AND (current_until IS NULL)))
Filter: (((now() >= current_since) AND (now() <= current_until)) OR ((current_since <= now()) AND (current_until IS NULL)))
Heap Blocks: exact=9
Buffers: shared hit=11
-> BitmapOr (cost=23.97..23.97 rows=374 width=0) (actual time=0.040..0.041 rows=0 loops=1)
Buffers: shared hit=2
-> Bitmap Index Scan on idx_current_at (cost=0.00..11.89 rows=1 width=0) (actual time=0.005..0.005 rows=0 loops=1)
Index Cond: ((current_since <= now()) AND (current_until >= now()))
Buffers: shared hit=1
-> Bitmap Index Scan on idx_current_at (cost=0.00..11.89 rows=374 width=0) (actual time=0.034..0.034 rows=374 loops=1)
Index Cond: ((current_since <= now()) AND (current_until IS NULL))
Buffers: shared hit=1
Planning:
Buffers: shared hit=4
Planning Time: 0.194 ms
Execution Time: 0.324 ms
It gets bad quickly when multiple of said tables are part of a larger join query:
SELECT * FROM record
LEFT JOIN comment
ON comment.record_id = record.id
AND (
:timestamp BETWEEN comment.current_since AND comment.current_until
OR (
comment.current_since <= :timestamp
AND comment.current_until IS NULL
)
)
LEFT JOIN attachment
ON attachment.comment_id = comment.id
AND (
:timestamp BETWEEN attachment.current_since AND attachment.current_until
OR (
attachment.current_since <= :timestamp
AND attachment.current_until IS NULL
)
)
WHERE
:timestamp BETWEEN record.current_since AND record.current_until
OR (
record.current_since <= :timestamp
AND record.current_until IS NULL
)
^ that’s a lot of "clutter" IMO.
My question is: How can I avoid writing these fairly complex conditions all the time manually.
If I ever decide to change the logic, I will have to update many queries manually. Yet, I would rather do that than sacrifice performance.
Keep in mind the timestamp isn’t a fixed NOW()
value; often an user provides a custom timestamp to fetch data current at a particular point in time in the past.
I have tried to write a stored function returning a boolean:
CREATE FUNCTION is_current(
at TIMESTAMP WITH TIME ZONE,
current_since TIMESTAMP WITH TIME ZONE,
current_until TIMESTAMP WITH TIME ZONE
)
RETURNS BOOLEAN
IMMUTABLE
LANGUAGE plpgsql
AS
$$
BEGIN
IF current_since IS NULL THEN
RETURN FALSE;
END IF;
IF current_since <= at AND current_until IS NULL THEN
RETURN TRUE;
END IF;
RETURN at BETWEEN current_since AND current_until;
END
$$;
Using it instead of the condition
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM record WHERE
is_current(:timestamp, record.current_since, record.current_until)
But it was significantly slower, as I expected:
Seq Scan on record (cost=10000000000.00..10000000107.17 rows=125 width=300) (actual time=27.852..28.052 rows=374 loops=1)
Filter: is_current(now(), current_since, current_until)
Buffers: shared hit=9
Planning Time: 0.117 ms
JIT:
Functions: 2
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.400 ms, Inlining 3.039 ms, Optimization 16.067 ms, Emission 8.698 ms, Total 28.204 ms
Execution Time: 28.509 ms
Bonus question: is it so much slower because using a function like this prevents the query executor from accessing the columns directly, forcing it constantly to call the function and then somehow work with the boolean result?
3
Answers
Create a view for each table and put all your basic conditions in these views. Just query these views for the rest of your live and never have to add the basic conditions again.
That’s also why I try to avoid tables in the public schema, only views and functions for public access will be there. No data.
If you are able to change the schema, the best solution would be to drop
current_since
andcurrent_until
and use e.g.current_range
, of type tstzrange, and populate it with:Selecting current records then becomes simply:
In this setup, the <@ operator effectively becomes the same as using BETWEEN, and we don’t need to test for "draft" rows – they won’t be selected, because for these rows
current_range
would be NULL.If you cannot drop these columns, but you can add new ones, adding a generated column might do the trick:
In this way, one can continue to use the original timestamptz columns, but still rely on the new (effectively read-only) range column for filtering,
If changing the schema is not an option at all, then putting the above column in a view will effectively achieve the same. In this case, however, the performance will be bad, because queries using <@ on the virtual
current_range
column won’t be able to use the index on {current_since, current_until}. There is a solution for that too, though, a functional index:GIST indexes on the tstzrange column – whether functional or not – should be, in fact, significantly more efficient for filtering the rows than the current composite btree index.
Your "fairly complex conditions" could be a bit simpler:
… and could all be moved from ON() clauses to the Where clause.
I’m affraid that (in this scenario) you should use all the filters anyway but below is the code where all of them are together.
See the sample with results dbfiddle here.