I’m trying to find out, why a simple aggregate query performs really slow on Postgres 15 from my point of view.
I have a growing data table with 10 mio records of sensor data
CREATE TABLE public.ts_number (
id int4 NOT NULL,
ts int8 NOT NULL,
val float4 NULL,
CONSTRAINT ts_number_pkey PRIMARY KEY (id, ts)
);
CREATE INDEX ts_number_id_idx ON public.ts_number USING btree (id, ts, val);
I want to aggregate all data into 1-minute interval and join it with the sensor name from another table. But I’m wondering why the basic aggregate has so big differences in execution speed? Timestamp is in milliseconds since UNIX epoch.
From my understanding, an additional index won’t help? There are only about 1000 rows with val = NULL in the whole set of 10 mio records.
select
date_trunc('minute', to_timestamp(tn.ts / 1000)) as "time",
tn.id,
round(avg(tn.val::numeric), 1) as value
from
ts_number tn
where
tn.val is not null
group by
tn.id,
date_trunc('minute', to_timestamp(tn.ts / 1000))
order by
date_trunc('minute', to_timestamp(tn.ts / 1000));
I’ve tried different indexes and subqueries to narrow down but I cannot find the answer.
Execution plan for Postgres:
jit=off, track_io_timing=true
EXPLAIN (ANALYZE, BUFFERS, TIMING)
`
QUERY PLAN |
----------------------------------------------------------------------------------------------------------------------------------------+
Sort (cost=1438043.51..1448107.59 rows=4025633 width=44) (actual time=45605.997..46283.850 rows=3507690 loops=1) |
Sort Key: (date_trunc('minute'::text, to_timestamp(((ts / 1000))::double precision))) |
Sort Method: external merge Disk: 92136kB |
Buffers: shared hit=32 read=79640, temp read=56259 written=100732 |
I/O Timings: shared/local read=12067.837, temp read=1178.562 write=3915.268 |
-> HashAggregate (cost=707851.61..923882.68 rows=4025633 width=44) (actual time=29691.221..42565.892 rows=3507690 loops=1) |
Group Key: date_trunc('minute'::text, to_timestamp(((ts / 1000))::double precision)), id |
Planned Partitions: 32 Batches: 33 Memory Usage: 33041kB Disk Usage: 368648kB |
Buffers: shared hit=32 read=79640, temp read=44742 written=89197 |
I/O Timings: shared/local read=12067.837, temp read=1147.856 write=3727.306 |
-> Seq Scan on ts_number tn (cost=0.00..295394.36 rows=10785399 width=16) (actual time=6.016..18958.130 rows=10785155 loops=1)|
Filter: (val IS NOT NULL) |
Rows Removed by Filter: 1073 |
Buffers: shared hit=32 read=79640 |
I/O Timings: shared/local read=12067.837 |
Planning Time: 0.289 ms |
Execution Time: 46617.899 ms |
`
pg_tune settings:
`
# DB Version: 15
# OS Type: linux
# DB Type: dw
# Total Memory (RAM): 4 GB
# CPUs num: 4
# Connections num: 25
# Data Storage: ssd
max_connections = 25
shared_buffers = 1GB
effective_cache_size = 3GB
maintenance_work_mem = 512MB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 10485kB
min_wal_size = 4GB
max_wal_size = 16GB
max_worker_processes = 4
max_parallel_workers_per_gather = 2
max_parallel_workers = 4
max_parallel_maintenance_workers = 2
`
Host is a J4125 quad core with 8GB RAM and 1TB SSD running Proxmox. Proxmox runs in its own container (plain debian bullseye) with virtual resources: 4 cores, 4GB RAM, 2GB SWAP. No others containers are currently active.
Updated results:
Added calculated column
ts_time timestamp generated always as (date_trunc('minute', timestamp '1970-01-01 00:00:00' + make_interval(secs => ts/1000))) stored
and filled it with
update ts_number
set ts_time = date_trunc('minute', timestamp '1970-01-01 00:00:00' + make_interval(secs => ts/1000));
then created a new index
create index on ts_number(ts_time, id)
include (val)
where val is not null;
then EXPLAIN (ANALYZE, BUFFERS, TIMING)
QUERY PLAN |
-----------------------------------------------------------------------------------------------------------------------------------------------------------+
Finalize GroupAggregate (cost=260210.24..262870.90 rows=10200 width=20) (actual time=28025.740..34159.907 rows=3507690 loops=1) |
Group Key: ts_time, id |
Buffers: shared hit=86581 read=93231, temp read=31437 written=32171 |
I/O Timings: shared/local read=46057.413, temp read=121.378 write=1017.921 |
-> Gather Merge (cost=260210.24..262590.40 rows=20400 width=44) (actual time=28025.718..31512.629 rows=3738515 loops=1) |
Workers Planned: 2 |
Workers Launched: 2 |
Buffers: shared hit=86581 read=93231, temp read=31437 written=32171 |
I/O Timings: shared/local read=46057.413, temp read=121.378 write=1017.921 |
-> Sort (cost=259210.21..259235.71 rows=10200 width=44) (actual time=23712.198..24001.505 rows=1246172 loops=3) |
Sort Key: ts_time, id |
Sort Method: external merge Disk: 90712kB |
Buffers: shared hit=86581 read=93231, temp read=31437 written=32171 |
I/O Timings: shared/local read=46057.413, temp read=121.378 write=1017.921 |
Worker 0: Sort Method: external merge Disk: 78008kB |
Worker 1: Sort Method: external merge Disk: 76384kB |
-> Partial HashAggregate (cost=258429.08..258531.08 rows=10200 width=44) (actual time=20173.138..21185.291 rows=1246172 loops=3) |
Group Key: ts_time, id |
Batches: 5 Memory Usage: 262193kB Disk Usage: 7536kB |
Buffers: shared hit=86553 read=93231, temp read=799 written=1530 |
I/O Timings: shared/local read=46057.413, temp read=3.658 write=34.272 |
Worker 0: Batches: 1 Memory Usage: 237585kB |
Worker 1: Batches: 1 Memory Usage: 237585kB |
-> Parallel Seq Scan on ts_number tn (cost=0.00..224726.62 rows=4493662 width=16) (actual time=2.821..16943.198 rows=3595052 loops=3)|
Filter: (val IS NOT NULL) |
Rows Removed by Filter: 358 |
Buffers: shared hit=86553 read=93231 |
I/O Timings: shared/local read=46057.413 |
Planning: |
Buffers: shared hit=25 |
Planning Time: 22.619 ms |
Execution Time: 34999.933 ms |```
2
Answers
Comparisons between a professional tool that required billions of dollars of investment (SQL Server) and do-it-yourself DBMSs are generally out of proportion. This is why companies pay dearly for professional tools…
In this case the reasons why there are such differences are based on the following facts:
There would still be a lot of other things to say on these subjects, and I’ll let you take a look at the article I wrote and which compares PostgreSQL and SQL Server on this subject, we note discrepancies going as far as to more than 1000!
The index on all three columns isn’t going to help as your query can’t make use of it.
Casting the average to be able to round it, rather then casting every value should already make a difference, i.e.
round(avg(tn.val)::numeric, 1)
instead ofround(avg(tn.val::numeric), 1)
However the query still needs the costly hash aggregate and sorting (and with your extremely small work_mem this is spilling to your apparently slow disk).
One way to avoid that, is to support this using an index. Building the index is easier if you do the conversion from the unix epoch to a proper timestamp using a generated column (which might be helpful in other situations as well):
The somewhat strange looking expression for the generated column is necessary because using
to_timestamp()
is not an immutable function and thus can’t be used in a generated column (or index).Then you can create such an index:
Which leads to this query:
This should already be faster than your original query. If you always aggregate on "minute level" you could already do that in the generated column:
which then leads to the simplified index:
and the simplified query:
This brought down the runtime from initially ~45 seconds to ~5 seconds on my old laptop with a spinning harddisk.
This kind of table is one of the rare cases where PostgreSQL’s lack of "index organized tables" (aka "clustered index" in SQL Server) is indeed a disadvantage because the index to support this kind of queries needlessly duplicates the storage (index + table).
Note that a
work_mem
of only 10MB is way too small if you are serious about doing larger aggregations with Postgres.I don’t really know if your SQL Server installation is also that severely limited with regards to the memory it can use for grouping and hashing, but given the numbers I’d suspect it can use substantially more memory.