Let’s say master and replica are in sync, and after sometime replica goes down and looses connectivity with master.
When replica comes up again, how does it know what partial data it needs to request for?
And also, if by some logic replica is able to ask the partial data it needs – how does master node respond by giving that partial data? My understanding is master node sends the RDB file to replica, how can it send partial RDB file?
2
Answers
Replica maintains the offset till which it has received the data from the master in it’s RDB file. So when Replica loses connection and comes up later it knows from which offset to ask the data for.
During the period the master loses the connection with the slave, a buffer on the Redis master, keeps track of all recent write commands: this buffer is called replication backlog.
Redis uses this backlog buffer to decide whether to start a full or partial data resynchronization.
A replica always begins with asking for partial resync (because it is more efficient than full resync) using its last offset. Master checks if the offset from which data is requested from replica, is retrievable from its backlog buffer or not. If the offset is in the range of the backlog, all the write commands during disconnection could be obtained from it, which indicates that a partial resynchronization can be done and the master approves and begins the partial resync.
On the other hand, if the connection was lost for a long time and the buffer became full on the master side, partial resync is not possible and master rejects it and begins the complete resync.
The buffer size is called:
repl-backlog-size
and its default size is1MB
For a system with High Wirtes: 1MB of repl-backlog-size will fill the buffer very quickly and will result in full resync even if replica loses connection for few seconds.
Another parameter:
repl-backlog-ttl
whose default value is1hour
determines how long the master Redis instance will wait to release the memory of the backlog if all the replicas get disconnected. So let's say your replica got disconnected by more than 1 hour and buffer is filled with only 100KB of data, it will result is complete re-sync as master would discard its buffer as it cannot hold it beyond 1 hour.https://redis.io/docs/management/replication/#how-redis-replication-works
Sending an RDB image is only used for a full sync.
For partial sync, the replicas keep track of their position in the replication log (which is initialized when they do a full sync, and then incremented every time they replicate a command). If the replica loses its connection and has to resync, it tells the master what its last valid sync offset was, and the master simply has to replay the portion of the replication log after that offset. For that purpose, it buffers the most recent log entries in memory. If the replica is too far behind (the log has accumulated more than
repl-backlog-size
bytes of transactions since the replica disconnected), then a partial sync isn’t possible and the master forces it to do a full sync instead.