We use postgresql as a database. For the backup part, we store full cluster backups and wal files to allow point in time restoration of our database.
Our wal files take up quite a bit of space when compared to our full backups, so we would like to inspect the content of our wal files, and more precisely be able to see what tables contribute the most to the volume of our wal files.
Question
Is there a way to inspect what tables/relations are targeted by the actions stored in a specific wal file ? and how many rows or records are affected ?
2
Answers
pg_waldump
is the tool you’d want to use to inspect the contents of a WAL file. Basically, you can look to see which DML and DDL statements (along with other events) were captured into the WAL file.One example is below:
Bear in mind that in WAL, tables are referred to by their
relfilenode
and not by theiroid
(look for16440
in the example)More information about
pg_waldump
can be found in the documentationI would use a radically different approach to solve your problem. WAL is binary information, and the connection between WAL records and tables is tenuous: for example, if you
TRUNCATE
a table, it gets a different file, and you cannot figure out any more what table an old WAL entry belongs to.But you can easily see how many rows get inserted, updated and deleted in each table:
These counts are cumulative and include all activity since the last statistics reset or crash, so you would run the same query on two consecutive days (or weeks) and calculate the difference to get an idea how many data modifications each table receives. The more data modifications, the more WAL.
This is of course a very coarse estimate. For one, it will depend on the row size. You could estimate the row size with
You can factor that size into the calculation. Again, that row size is a very coarse estimate.