skip to Main Content

I have a table with jsonb column, turns out storing a lot of json data in this column per row is taking too much space, with around 90M records this table has consumed around 167GB storage.

Now I want to remove this column from the database and reclaim the storage. I went through the concept of VACUUM from PSQL documentation here. From the documentation I understood that postgres automatically runs a routine vacuum job to reclaim the storage periodically based on criterion as mentioned in the documentation.

I have some confusion about this line:

Plain VACUUM (without FULL) simply reclaims space and makes it available for re-use. This form of the command can operate in parallel with normal reading and writing of the table, as an exclusive lock is not obtained. However, extra space is not returned to the operating system (in most cases); it’s just kept available for re-use within the same table.

I have few doubts here, from the documentation I read FULL vacuum requires extra space, now since my database does not have that kind of space available, want to make sure does auto vacuum also have same requirement? Do I need to upgrade the space before applying these changes (dropping the column)? Is there a way to validate this in local?

2

Answers


  1. If the drive space is mainly needed for that same table to re-use, just drop the column and vacuum the table (without the full) or leave it for auto-vacuum to come around, and that’s it.

    since my database does not have that kind of space available, want to make sure does auto vacuum also have same requirement

    Unless you took extra steps to disable auto-vacuum, it’s been running this whole time on your server: pg_stat_all_tables system view can tell you when it took place most recently. See the server log to check it reported failures. You can see its progress in pg_stat_progress_vacuum.

    As outlined in the doc you quoted, regular vacuum reclaims space only for that same table to use (unless it frees whole pages at the end), so you typically won’t see as much of a drop in total space consumption as you would in case of vacuum full. It performs an in-place, internal table cleanup, aiming to retain its size, without any attempt to compact it.

    If your estimation of the table size is based on pg_total_relation_size(), there’s space to be reclaimed not just in/from the table pages, but also its TOAST and indexes.

    Indexes are related but separate objects, so space they occupied will be just freed. If you have an index on that one column, you’ll be able to reclaim the entire space it used. Indexes using that column with some others will have to be dropped and redefined, which also means some additional free space due to compaction on rewrite as well as lighter structure overall – as long as you have the space to perform it. If you don’t, you can save the index definitions before dropping them:

    create temp table index_definitions as select indexdef 
      from pg_indexes where schemaname='your_schema' and tablename='your_table';
    

    And once you have the space, re-build them (after editing out the column you’re getting rid of)

    do $p$ declare indexdef_ text; 
    begin for indexdef_ in select indexdef from index_definitions loop
              execute indexdef_;
          end loop;
          drop table index_definitions;
    end $p$;
    

    All vacuum by default runs with PROCESS_TOAST=true so your TOAST can shrink as well. When considering a vacuum full after you drop some weight you can consider full-vacuuming just the main table by setting PROCESS_TOAST=false, or just the TOAST by picking it out of the system pg_class:

    select reltoastrelid::regclass,  pg_size_pretty(pg_relation_size(reltoastrelid)) 
    from pg_class
    where reltoastrelid<>0 and 'r'=relkind
    and (oid::regclass)::text='your_schema.your_table';
    

    If/when you do vacuum full, you can track its progress in pg_stat_progress_cluster. You can also run it with verbose.


    Do I need to upgrade the space before applying these changes (dropping the column)?

    You don’t. The alter table...drop column on its own doesn’t cause a re-write, so it doesn’t need space. The subsequent auto-vacuum won’t attempt a re-write so it doesn’t need it either.


    Is there a way to validate this in local?

    You can set up a mirror of that system, replicate the table through pg_dump|psql, populate additional tables with dummy records so that you also have exactly the same amount of space left, and run your tests there. Re-writing the table to another system will compact it, so it’s not entirely apples-to apples comparison but it’s a safe playground to test on.

    Login or Signup to reply.
  2. Vacuum full needs enough space temporarily for a complete copy of the new table and new copies of indices. Presumably that will be much less than the old table as the obsolete data is not copied. Any indexes you can drop before hand will have the triple benefit of freeing up space the old copy takes, nott needing to make a new copy, and shortening the time the whole thing takes. If the index is not absoluely essential, you can put the table back into service while the index rebuilds.

    Dropping a column just followed up with a non-FULL vacuum won’t free up any space for reuse. For any rows updated after the drop, once the old row version is vacuumed away that will free up space. You could take advantage of this by doing a dummy update of some small fraction of the rows followed by manual vacuum.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search