I have a table with jsonb column, turns out storing a lot of json data in this column per row is taking too much space, with around 90M records this table has consumed around 167GB storage.
Now I want to remove this column from the database and reclaim the storage. I went through the concept of VACUUM from PSQL documentation here. From the documentation I understood that postgres automatically runs a routine vacuum job to reclaim the storage periodically based on criterion as mentioned in the documentation.
I have some confusion about this line:
Plain VACUUM (without FULL) simply reclaims space and makes it available for re-use. This form of the command can operate in parallel with normal reading and writing of the table, as an exclusive lock is not obtained. However, extra space is not returned to the operating system (in most cases); it’s just kept available for re-use within the same table.
I have few doubts here, from the documentation I read FULL vacuum requires extra space, now since my database does not have that kind of space available, want to make sure does auto vacuum also have same requirement? Do I need to upgrade the space before applying these changes (dropping the column)? Is there a way to validate this in local?
2
Answers
If the drive space is mainly needed for that same table to re-use, just drop the column and
vacuum
the table (without thefull
) or leave it for auto-vacuum to come around, and that’s it.Unless you took extra steps to disable auto-vacuum, it’s been running this whole time on your server:
pg_stat_all_tables
system view can tell you when it took place most recently. See the server log to check it reported failures. You can see its progress inpg_stat_progress_vacuum
.As outlined in the doc you quoted, regular
vacuum
reclaims space only for that same table to use (unless it frees whole pages at the end), so you typically won’t see as much of a drop in total space consumption as you would in case ofvacuum full
. It performs an in-place, internal table cleanup, aiming to retain its size, without any attempt to compact it.If your estimation of the table size is based on
pg_total_relation_size()
, there’s space to be reclaimed not just in/from the table pages, but also its TOAST and indexes.Indexes are related but separate objects, so space they occupied will be just freed. If you have an index on that one column, you’ll be able to reclaim the entire space it used. Indexes using that column with some others will have to be dropped and redefined, which also means some additional free space due to compaction on rewrite as well as lighter structure overall – as long as you have the space to perform it. If you don’t, you can save the index definitions before dropping them:
And once you have the space, re-build them (after editing out the column you’re getting rid of)
All
vacuum
by default runs withPROCESS_TOAST=true
so your TOAST can shrink as well. When considering avacuum full
after you drop some weight you can consider full-vacuuming just the main table by settingPROCESS_TOAST=false
, or just the TOAST by picking it out of the systempg_class
:If/when you do
vacuum full
, you can track its progress inpg_stat_progress_cluster
. You can also run it withverbose
.You don’t. The
alter table...drop column
on its own doesn’t cause a re-write, so it doesn’t need space. The subsequent auto-vacuum won’t attempt a re-write so it doesn’t need it either.You can set up a mirror of that system, replicate the table through
pg_dump
|
psql
, populate additional tables with dummy records so that you also have exactly the same amount of space left, and run your tests there. Re-writing the table to another system will compact it, so it’s not entirely apples-to apples comparison but it’s a safe playground to test on.Vacuum full needs enough space temporarily for a complete copy of the new table and new copies of indices. Presumably that will be much less than the old table as the obsolete data is not copied. Any indexes you can drop before hand will have the triple benefit of freeing up space the old copy takes, nott needing to make a new copy, and shortening the time the whole thing takes. If the index is not absoluely essential, you can put the table back into service while the index rebuilds.
Dropping a column just followed up with a non-FULL vacuum won’t free up any space for reuse. For any rows updated after the drop, once the old row version is vacuumed away that will free up space. You could take advantage of this by doing a dummy update of some small fraction of the rows followed by manual vacuum.