skip to Main Content

Is it possible to run PostgreSQL 11’s VACUUM FULL for a short while and then get some benefit? Or does cancelling it midway cause all of its progress to be lost?

I’ve read about pg_repack (https://aws.amazon.com/blogs/database/remove-bloat-from-amazon-aurora-and-rds-for-postgresql-with-pg_repack/) but the way it works (creating new tables, copying data, etc.) sounds risky to me. Is that my paranoia or is it safe to use on a production database?

Backstory: I am working with a very large production database on AWS Aurora PostgreSQL 11. Many of the tables had tens of millions of records but have been pruned down significantly. The problem is that the table sizes on disk (and in the snapshots) have not decreased because DELETE and VACUUM (without FULL) do not shrink the files. These tables are in the hundreds of gigabytes range and I’m afraid running VACUUM FULL will take forever.

2

Answers


  1. No. VACUUM FULL writes a new physical file for the table. Stopping it before it finishes voids the work done so far.

    The manual:

    VACUUM FULL rewrites the entire contents of the table into a new
    disk file with no extra space, allowing unused space to be returned to
    the operating system. This form is much slower and requires an ACCESS EXCLUSIVE lock on each table while it is being processed.

    This is the main reason why community tools like pg_repack or pg_squeeze were created, which are more flexible, less blocking, and often faster, too. (I don’t think pg_squeeze is available for Aurora, yet).

    Login or Signup to reply.
  2. pg_repack might be a bit of overkill. You can instead just delete tuples from the end of the table and reinsert them towards the front of the table (reusing space already marked as free by an earlier VACUUM), at which point another ordinary VACUUM can truncate away the free space at the end of the table.

    with d as (delete from mytable where ctid>='(50000,1)' returning *) 
    insert into mytable select * from d;
    

    You can use pg_freespacemap to figure out where would be a good place to start the ctid criterion at.

    This might not behave well if you have triggers or FK constraints, and it might bloat indexes such they would need to be rebuilt (but they probably do anyway). It will also lock a large number rows at a time, for the duration it takes for the re-insert to run and commit.

    Improvements made since v11 will make the ctid scan more efficient than it will be in v11.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search