I couldn’t find a definite answer for my concerns, so I might as well ask it from you guys!
Long story short:
We need to perform an UPDATE command on roughly 400M rows. The command could be modified to work in batches I know, but that is a different topic. Our problem is that the WAL gets too big and we run out of disk space.
I’m wondering how the checkpoint intervals work with different WAL levels. To put it simply the documentation says that a longer checkpoint interval "triggers" less full page writes, which results in a smaller WAL. What I can’t find is how this change behaves with different wal_level settings.
DB version: Postgres14.4
1. Does it have any relevance with a minimal
wal_level setting? (Considering it removes almost all logging.)
2. Does it break the replicas when the wal_level is set to replica
or higher? (It isn’t obvious to me based on different articles and the documentation, but I assume the replicas should be fine since all the changes are logged despite of fewer full page/block writes, and it also can be benefitial i.e. decreased WAL size.)
We are in a position where a full backup and shutdown of related application is possible, so the minimal
wal_level setting could work, but I’m interested in different solutions as well, feel free to share some thoughts on it.
Cheers!
3
Answers
wal_level = minimal
won’t make a difference. As long as you don’t set it tological
, PostgreSQL should produce about the same amount of WAL. It will break replication if you setwal_level
to something lower thanreplica
.The obvious solution is to add more disk space. If the problem is WAL archiving, you can disable
archive_mode
. If the problem is that checkpoints take too long to complete, you could run a manualCHECKPOINT
command.Increase
max_wal_size
to reduce the amount of WAL written. Yes, I know that it sounds strange, butmax_wal_size
does not govern the size ofpg_wal
, but it triggers checkpoints (which increase the number of full page images written).It doesn’t. With minimal, you only skip WAL logging of a few things, like COPY into a table which was created or truncated in the same transaction, or the creation of indexes. Those special cases wouldn’t apply to a bulk UPDATE.
To solve the problem, you first need to figure out what the root problem is. Are you so close to the out-of-space condition under normal conditions than any stress at all can push you over? Do you have replication slots, and the standbys can’t keep up? Do you have an archive_command that can’t keep up? Is your IO system so overwhelmed that the checkpoints can’t finish in time despite trying as fast as they can? Is you max_wal_size writing checks your harddrive can’t cash?
make sure about some points:
checkpoint_time
andmax_wal_size
are not too small.archive_command
is working.These point are importants to avoid flooding the I/O system.