We’re using Debezium as a change data capture connector for PostgreSQL (actually RDS).
If for any reason the debezium connector has a terminal failure, the WAL will continue to grow until either debezium is fixed, or the server runs out of storage space.
Is it possible to set a limit on the size a WAL can grow?
I’ve looked into https://postgresqlco.nf/doc/en/param/max_wal_size/ and am not convinced it will do what we need.
Basic architecture is:
Events > Postgres > debezium > Events
In an ideal world if WAL hits a limit events hitting postgres will error as the wal log cannot be written to. Upon restoring debezium and it picking up the WAL log it should read/clear and continue then to process new events.
Thank you for any help.
2
Answers
If I understood you correctly, there is no way to do set a limit to WAL size. As long as there is an activity(modification) in your database, the wal logs will be generated; otherwise your database processes will be halted. Only limitation is of course is the storage size, but if you hit that limit as I mentioned database will be halted. This is my answer to your question.
In addition to your question, I don’ t know how Debezium works, but in Oracle GoldenGate there was a concept called data pumps. In order to deal woth this kind of scenario you need to transfer CDC logs to target server or another server if you have a strict storage limitation on the database server. Afterwards, Debezium needs to read from the file and apply or streams the changes on the target side if there is similar approach in Debezium. So, the CDC replication/stream is not directly from database server to target. It should be from database server to the file and then from file to target.
Simply set the PostgreSQL configuration parameter
max_slot_wal_keep_size
to the desired limit of WAL you eant to retain. If that size is exceeded, PostgreSQL will remove WAL and you will have to rebuild the replication.