skip to Main Content

PostgreSQL v12. Database cluster size: 12+ Tb (30+ databases 100-600Gb each). Server: 40 cores, 128Gb ram.

For some reason, autovacuum stopped running. How to understand the reason?
Yesterday, the workers were running every minute for each base (according to the settings), but then they stopped and are no longer running.

Settings:

autovacuum = on

autovacuum_max_workers = 4
autovacuum_naptime = 1min

autovacuum_vacuum_threshold = 50
autovacuum_analyze_threshold = 50
autovacuum_vacuum_scale_factor = 0.10
autovacuum_analyze_scale_factor = 0.05

autovacuum_vacuum_cost_delay = 0
autovacuum_vacuum_cost_limit = 1000

For example, here is a table in one of the databases:

relname: ods_storage_sync_queue
n_live_tup: 8
n_dead_tup: 180542
autovacuum_count: 74167
last_autovacuum: 2023-09-26 18:48:11.868
last_autoanalyze: 2023-09-26 18:48:11.869 

The table contains an infinite number of dead tuples, but autovacuum does not clean it.

Another example:

relname: ods_storage_sync_queue
n_live_tup: 11434
n_dead_tup: 2064
autovacuum_count: 16727
last_autovacuum: 2023-09-26 18:13:21.612
last_autoanalyze: 2023-09-26 18:47:11.530 

There are 18% dead tupples. According to autovacuum_vacuum_scale_factor = 10%, autovacuum should have started a long time ago, but a day has passed… nothing happens, although yesterday everything started once a minute, as set in autovacuum_naptime.

What else can I check? How to understand the reason?

2

Answers


  1. Chosen as BEST ANSWER

    The described problem was solved by updating PostgreSQL from 12.1 to 12.16. We found in Release Notes that somewhere along the way from 12.1 to 12.16 an error was fixed that caused autovacuum to be stuck.


  2. The one possibility is that autovacuum really does not run. You should set log_autovacuum_min_duration = 0 and examine your log file to see if that is the case. Reasons for autovacuum not starting are

    • The statistics collector process has a problem. That is not the case here, since you see the number of dead tuples increasing.

    • All 4 slots of autovacuum_max_workers are already taken by long-running autovacuum workers, and other tables that need VACUUM have to wait until one of them finishes. You can diagnose that with pg_stat_activity, and the solution would be to increase autovacuum_max_workers.

    However, it is likely that autovacuum does run, but is prevented from cleaning up dead tuples. As detailed in this article, there are four possible reasons:

    • a concurrent long running transaction

    • a stale replication slot

    • an orphaned prepared transaction

    • a lagging standby with hot_standby_feedback = on

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search