I have a Postgres 11 database that grew faster than expected, ultimately to around 500 million rows, and after optimizing our code we were able to get this down to around 10 million rows at any given time (around 1,000-5,000 inserts per second on average).
The problem is that after removing 490 million rows, auto vacuum seems to constantly be running, and queries still take almost as long as they did before removing 490 million rows.
As downtime is not really an option for us, is there any suggestions on how we could help optimize things at all?
- Should I remove any indexes on the table and recreate them? Will this help at all?
- Since auto vacuum seems to constantly run, and we can’t do a vacuum full (can’t have downtime if possible), any recommendations around this?
I posted another question about adding a primary key to use AWS DMS to upgrade to a newer version of Postgres where I believe VACUUM can run in parallel (hopefully that will help): Postgres add PRIMARY KEY to partitioned very active table without lock
Kinda running out of options hoping someone may have some suggestions 🙁
Here’s output from just a simple query for 1000 rows that takes ~18-19 seconds
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) SELECT * FROM "public"."table" LIMIT 1000 OFFSET 0;
Limit (cost=0.00..46.92 rows=1000 width=128) (actual time=18452.685..18468.174 rows=1000 loops=1)
Output: difficulty, user, worker, useragent, ipaddress, source, created
Buffers: shared hit=2585 read=322681 dirtied=321175 written=102
I/O Timings: read=13292.325 write=0.399
-> Seq Scan on public.shares_nexa (cost=0.00..14031593.64 rows=299050464 width=128) (actual time=18452.683..18468.080 rows=1000 loops=1)
Output: difficulty, user, worker, useragent, ipaddress, source, created
Buffers: shared hit=2585 read=322681 dirtied=321175 written=102
I/O Timings: read=13292.325 write=0.399
Planning Time: 0.056 ms
Execution Time: 18468.237 ms
n_dead_tup
for this specific table shows 556993041
3
Answers
For reference the way we were able to solve this was a combination of things. Thanks to @jjanes answer I dug further into
autovacuum_work_mem
which found was set to-1
which tells db to usemaintenance_work_mem
.What I did was:
autovacuum_work_mem
value up to1GB
autovacuum_vacuum_cost_limit
up to2000
IOPS in AWS up to
19000
After doing this, and then scaling up the IOPS to max, the vacuum process was able to complete without constantly restarting itself. After doing that was able to scale IOPS back down to a normal level.
Another option that worked as well in test environment was inside of a transaction creating a new table, dropping the original, and then renaming to the original table name:
Doing this we then were able to re-add the indexes after it was completed. IF you want to copy indexes and everything else use this instead when creating the tmp table:
You are likely suffering from index-bloat.
Try rebuilding and/or reorganizing Indexes.
You can use the REINDEX command to rebuild all indexes on a table or a specific index, or the CLUSTER command to physically reorder the table data to match the order of the index.
Also increase Auto Vacuum Settings. Increase the autovacuum_vacuum_scale_factor and autovacuum_analyze_scale_factor settings to ensure that more data is vacuumed and analyzed during each run
Hope this fixes your problem.
You say that autovac is constantly running, but it isn’t clear why this should be. It could be that autovac is highly throttled, and so it just takes a long time to do its thing and you haven’t seen it finish yet. Or it could take a large number of passes over the indexes, because autovacuum_work_mem is too low and so it can only clean up so many tuples per pass (and of course each of those compound the other). Or maybe the autovac is constantly getting interrupted by competing locks before it makes any real progress, which would show up in the log files. A manual vacuum could solve all of these, as it would run unthrottled (under the default settings, anyway), and will use whatever local setting you have for autovacuum_work_mem, and will not automatically cancel itself when confonted with blocking processes.
You can move some of the tuples from the end of the table to the front of the table. This means the query you showed us would find those ones in the front right away, and so end early. It is hard to know what other types of queries might do without seeing those queries. (Your setting of synchronize_seqscans might interfere with this. If that is set to on, the default unless RDS messed around it, then this shouldn’t be such a big problem to start with as the query would just pick up where the last one left off, which should be in the populated part of the table, and so the problem should only happen about 1000/10e6 of the time when it needs to span the vast empty region)
You can do this tuple movement with something like:
Make sure any triggers you have on the table will do the right thing in the face of such a procedure, or disable them.
Ideally you would do this repeatedly, lowering the ctid constant each time until all the tuples were moved to the front (you would want to do it in small chunks to mimimize the about of time any given tuple is locked, plus so that if you end up needing to cancel it you don’t lose all the work from previous batches), and then VACUUM would truncate the table to a smaller size. This operation would make any bloat on the indexes only worse, so you then would need to reindex (concurrently) to get rid of that if it is bothersome.