I have a table containing around 95 million rows, pretty much only storing a timestamp and further IDs of related tables containing the actual data in the end.
CREATE TABLE
clt_rec
(
id BIGSERIAL NOT NULL,
oms_rec BIGINT NOT NULL,
captured_at TIMESTAMP(6) WITH TIME ZONE NOT NULL,
rssi SMALLINT NOT NULL,
CONSTRAINT pk_clt_rec PRIMARY KEY (id),
CONSTRAINT fk_clt_rec_oms_rec FOREIGN KEY (oms_rec) REFERENCES "oms_rec" ("id"),
CONSTRAINT uk_clt_rec_oms_rec UNIQUE (oms_rec)
);
In many use cases I need to search all of those rows based on their timestamp to find rows arbitrary in the past: Sometimes it’s only 15 minutes into the past, sometimes it’s 2 years, sometimes it’s finding the first day of each month over 15 months for some of those telegrams etc. In the end, I pretty often need to compare those timestamps and some queries simply take multiple seconds in the end, especially adding up if multiple, but slightly different queries need to be executed one after another. The following are two abstracts of Postgres’ query plans:
Plan 1:
-> Nested Loop (cost=1.14..343169.49 rows=43543 width=20) (actual time=0.313..113.974 rows=34266 loops=3)
-> Parallel Index Scan using idx_clt_rec_captured_at on clt_rec (cost=0.57..3437.90 rows=43543 width=24) (actual time=0.153..20.192 rows=34266 loops=3)
Index Cond: ((captured_at >= ('2020-08-01 00:00:00+02'::timestamp with time zone - '00:00:00'::interval)) AND (captured_at <= ('2020-08-01 00:00:00+02'::timestamp with time zone + '1 day'::interval)))
-> Index Scan using pk_oms_rec on oms_rec (cost=0.57..7.80 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=102799)
Index Cond: (id = clt_rec.oms_rec)
Plan 2:
-> Nested Loop (cost=1.14..836381.50 rows=111934 width=20) (actual time=0.379..911.697 rows=334465 loops=3)
-> Parallel Index Scan using idx_clt_rec_captured_at on clt_rec (cost=0.57..8819.57 rows=111934 width=24) (actual time=0.193..154.176 rows=334465 loops=3)
Index Cond: ((captured_at >= ('2020-08-28 10:21:06.968+02'::timestamp with time zone - '14 days'::interval)) AND (captured_at <= ('2020-08-28 10:21:06.968+02'::timestamp with time zone + '00:00:00'::interval)))
-> Index Scan using pk_oms_rec on oms_rec (cost=0.57..7.39 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=1003394)
Index Cond: (id = clt_rec.oms_rec)
Postgres seems to properly use available indexes, parallel workers and stuff like that. But looking at the actual times and compared to all the other parts of the query, comparing those timestamps simply takes the most time.
I’ve looked into this topic and found statements about that one shouldn’t put too many rows into the index and stuff like that or it will be ignored at all. But that doesn’t seem to be the case for me according to the plan. OTOH, my index really simply is about the column containing the timestamp, no function reducing things to dates or stuff like that to reduce the number of rows.
CREATE INDEX idx_clt_rec_captured_at ON clt_rec USING btree ( captured_at );
So where does the actual runtime come from in the above plan? Is it simply comparing lots of timestamps and that takes how long it takes? Or is it calculating with those timestamps already?
Thanks!
2
Answers
The time comes from the 102799 iterations of the index scan on the inner table.
Each of these index scans takes a (rounded) 0.002 milliseconds on average, so you end up with 102799 * 0.002 = 206 milliseconds. That is actually more than the whole query takes; the difference must be due to rounding errors.
That looks quite alright, and it is hard to suggest an improvement.
The dominant run time is neither comparing timestamps, nor calculating with those timestamps.
Take your second plan. Only 154.176 out of 911.697 ms are spent on the actual timestamp index part of the query, which is about 15% of the total. And of even that time, it isn’t clear how much of it is spent "comparing timestamps", versus doing other work like reading data from disk, or jumping between the index and the table to get the oms_rec value. Presumably the comparing of the timestamps is a small fraction of what is already a small part of the total time.
If you turn track_io_timing on and then do
EXPLAIN (ANALYZE, BUFFERS)
, it would provide some insight into the first of those, the time spent reading from disk. You can perhaps reduce the second (and thus also first) of those, jumping from index to table, by including osm_rec into your index definition (after captured_at) so it could do an index-only-scan instead.So where is the rest of the time, 911 – 154, going? It is going to the "Index Scan using pk_oms_rec on oms_rec", which doesn’t involve timestamps at all. Does this makes sense? Well, looking again at the plan output, actual time * loops / parallel workers, 0.002 * 1003394 / 3 = 670ms of clock time. Now this is very uncertain because "actual time" is reported to only one significant digit, but it certainly seems to represent about the right amount of time. I’m no exactly sure I’ve dealt properly with the parallel worker component worker there (maybe it was already accounted for and shouldn’t be accounted for again?), but even so a 3-fold difference is probably not enough to change the conclusion given that we only have one sig fig to start with. It is a lot easier to interpret plans if you turn parallelization off first (max_parallel_workers_per_gather=0).