I’m trying to understand more about the apacheAGE extension therefore I’m reading the inner workings of PostgreSQL. From what I understand every operation that alters the table is written at the WAL buffer, but after it is commited/aborted it is immediately written to the WAL segment file on the storage.
Why is the first part needed? Isn’t having 2 steps more time-consuming, since the WAL segment file is enough by itself to recover from a server crash?
3
Answers
For a transaction involving a single change on a system with a single user, yes, it would be better to write directly to the file. But if you have many changes in a transaction, and many concurrent sessions running transactions, writing each one directly to storage is terribly inefficient because each log file write takes a millisecond or more.
WAL buffers is just a cache for WAL; eventually the data are written to the WAL segment files. Like with all caches, the goal is to boost performance.
The Write-Ahead Logging (WAL) mechanism in PostgreSQL is a critical component for ensuring data consistency and durability. The WAL buffers serve as a temporary storage area for changes made to the database, allowing them to be quickly written to disk and later replayed in case of a crash or other failure.
The reason why changes are first written to the WAL buffers before being committed to the WAL segment files on disk is to ensure that they are quickly and efficiently captured in memory. This provides a performance benefit since writing to disk can be a slow and expensive operation. By buffering the changes in memory first, PostgreSQL can achieve better throughput and response times.
Once the changes are committed, they are then written to the WAL segment files on disk, which provides durable storage of the changes. This ensures that in the event of a server crash, the database can be restored to a consistent state by replaying the WAL records from the most recent checkpoint.
So, having two steps may seem more time-consuming, but it is an important mechanism for ensuring that changes are captured quickly and reliably, while also providing durability in case of failures. The combination of in-memory buffering and disk-based storage provides a good balance of performance and reliability for PostgreSQL.