I have a local PG instance populated with 10’s of millions of rows of data and need to run some relatively complex queries for some data analysis work.
These queries are currently taking 10+ minutes to return (even after adding custom index’s for every query I’m running).
While this may not be the "right tool for the job", I’m sure that my system isn’t fully utilizing its available resources. I’ve configured my system using PGTune, but it still seems like this is taking into account a margin of safety for application stability, multiple connections, competing processes, etc.
If I just want PG to run as fast as it possibly can for my single connection…
What are the most important settings? And how should they be configured, relative to the system specs?
(Mine are 8-core, 32GB ram for example)
2
Answers
If your statements are slow, you have to find the cause with
EXPLAIN (ANALYZE, BUFFERS, SETTINGS)
and improve that. Twiddling parameters will achieve less than you think.That said, the most important parameter for queries on big tables is
work_mem
. Set it as high as you can without going out of memory.The default for max_parallel_workers_per_gather is 2, which is too low for the situation you describe. It should be set equal to the number of CPUs (actually one less than the number of CPUs, but that isn’t likely to make any meaningful difference). But not all queries can benefit from parallel workers, so this might not make much difference to you.
High values for effective_io_concurrency can help if your queries involve bitmap heap scans and if your IO system can benefit from having multiple IO requests in flight at the same time (RAID/JBOD generally can. SSD (even single-drive systems) usually can if it is high quality. Even single-disk HDD can often get some smallish benefit).
Setting effective_cache_size to the same size as all of RAM (or just slightly less) can help for some queries.
Increasing work_mem can help. But be careful not to overdue, even a single session can allocate many multiples of work_mem if it involves sorts in many different executor nodes, or parallel workers, or partitions. Although the "spill to disk" algorithms are now so good that this often doesn’t make a big difference.