Redis multi sentinel not elect new redis master after failure

Jividijuvent
January 15, 2020
161 views
3 votes
3 Answers

i have a three node redis and 3 node sentinel, everything is ok and all master and slaves are verified and sentinel config file is updated with all redis and sentinel nodes, but the problem is when redis master is down and sentinel wants to elect the failed master again and wont elect between other slaves to choose new master,here is my config files and logs.

vm1: redis master and sentinel1 192.168.1.48

vm2: redis slave and sentinel2 192.168.1.51

vm3: redis slave and sentinel3 192.168.1.52

Redis master conf: (vm1)

bind 192.168.1.48 127.0.0.1 
protected-mode yes
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 60
daemonize no
supervised systemd 
pidfile /var/run/redis_6379.pid
loglevel notice
logfile ""
databases 16
always-show-logo yes
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /var/lib/redis 
replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
replica-priority 100
requirepass 123456789
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4096
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes

one of sentinel configs: (vm1)

#1
bind 0.0.0.0
port 26379
#2
sentinel myid 7e09f70bc68cdc0afee3d8cd9bdf3fe6f320a3d5
sentinel deny-scripts-reconfig yes
sentinel monitor redis-cluster 192.168.1.48 6379 3
sentinel down-after-milliseconds redis-cluster 5000
#3
sentinel failover-timeout redis-cluster 1000
sentinel parallel-syncs redis-cluster 1

#misc
daemonize yes
pidfile "/var/run/redis_26379.pid"
logfile "/var/log/redis_26379.log"
dir "/var/lib/redis"

##############
# Generated by CONFIG REWRITE
protected-mode no
sentinel auth-pass redis-cluster 123456789
sentinel config-epoch redis-cluster 0
sentinel leader-epoch redis-cluster 52
sentinel known-replica redis-cluster 192.168.1.51 6379
sentinel known-replica redis-cluster 192.168.1.52 6379
sentinel known-sentinel redis-cluster 192.168.1.52 26379 0b37aa7287e89ad38a90a97cdff16c22793678a6
sentinel known-sentinel redis-cluster 192.168.1.51 26379 9d097bb22ffdf87c7f8a403a8dc82c989790cf3b
sentinel current-epoch 52

another Sentinel config: (vm2)

#1
bind 0.0.0.0
port 26379
#2
sentinel myid 9d097bb22ffdf87c7f8a403a8dc82c989790cf3b

daemonize yes
pidfile "/var/run/redis_26379.pid"
sentinel deny-scripts-reconfig yes
sentinel monitor redis-cluster 192.168.1.48 6379 2
sentinel down-after-milliseconds redis-cluster 5000
sentinel failover-timeout redis-cluster 10000
#3
sentinel auth-pass redis-cluster 123456789
#misc
logfile "/var/log/redis_26379.log"
dir "/var/lib/redis"
# Generated by CONFIG REWRITE
protected-mode no
sentinel config-epoch redis-cluster 0
sentinel leader-epoch redis-cluster 28811
sentinel known-replica redis-cluster 192.168.1.51 6379
sentinel known-replica redis-cluster 192.168.1.52 6379
sentinel known-sentinel redis-cluster 192.168.1.52 26379 0b37aa7287e89ad38a90a97cdff16c22793678a6
sentinel known-sentinel redis-cluster 192.168.1.48 26379 7e09f70bc68cdc0afee3d8cd9bdf3fe6f320a3d5
sentinel current-epoch 28811

Sentinel log after master failure: (vm2)

2692:X 15 Jan 2020 09:19:42.576 # +vote-for-leader 0b37aa7287e89ad38a90a97cdff16c22793678a6 28804
2692:X 15 Jan 2020 09:19:42.582 # Next failover delay: I will not start a failover before Wed Jan 15 09:20:02 2020
2692:X 15 Jan 2020 09:20:02.659 # +new-epoch 28805
2692:X 15 Jan 2020 09:20:02.660 # +try-failover master redis-cluster 192.168.1.48 6379
2692:X 15 Jan 2020 09:20:02.662 # +vote-for-leader 9d097bb22ffdf87c7f8a403a8dc82c989790cf3b 28805
2692:X 15 Jan 2020 09:20:02.674 # 0b37aa7287e89ad38a90a97cdff16c22793678a6 voted for 9d097bb22ffdf87c7f8a403a8dc82c989790cf3b 28805
2692:X 15 Jan 2020 09:20:02.745 # +elected-leader master redis-cluster 192.168.1.48 6379
2692:X 15 Jan 2020 09:20:02.745 # +failover-state-select-slave master redis-cluster 192.168.1.48 6379
2692:X 15 Jan 2020 09:20:02.846 # -failover-abort-no-good-slave master redis-cluster 192.168.1.48 6379
2692:X 15 Jan 2020 09:20:02.902 # Next failover delay: I will not start a failover before Wed Jan 15 09:20:22 2020

Answers

Chosen as BEST ANSWER

Thanks for your answer, here is important output from info command in vm3:

127.0.0.1:6379> info
# Server
redis_version:5.0.7
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:3a8feb873b35d460
redis_mode:standalone
os:Linux 4.15.0-74-generic x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:7.4.0
process_id:975
run_id:ea2df2c50a10ccb7468a269fb7de23f7ab25c049
tcp_port:6379
uptime_in_seconds:422584
uptime_in_days:4
hz:10
configured_hz:10
lru_clock:2447183
executable:/usr/local/bin/redis-server
config_file:/etc/redis/redis.conf

# Clients
connected_clients:1
client_recent_max_input_buffer:2
client_recent_max_output_buffer:0
blocked_clients:0

# Persistence
loading:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1579082903
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0

# Stats
total_connections_received:2
total_commands_processed:34570
instantaneous_ops_per_sec:0
total_net_input_bytes:484248
total_net_output_bytes:13401752
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0

# Replication
role:slave
master_host:192.168.1.48
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:483210
master_link_down_since_seconds:75851
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:fa17d61cf7cf47bdad1b50a51e3302cb256d326c
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:483210
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:483210

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=1,expires=0,avg_ttl=0

And here is redis-sentinel important output in vm3:

127.0.0.1:26379> info
# Server
redis_version:5.0.7
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:3a8feb873b35d460
redis_mode:sentinel
os:Linux 4.15.0-74-generic x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:7.4.0
process_id:4928
run_id:8e7ba5f508555357262db911e0863929a6311ab4
tcp_port:26379
uptime_in_seconds:14
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:2447587
executable:/home/banico/redis-server
config_file:/etc/redis/26379.conf

# Clients
connected_clients:1
client_recent_max_input_buffer:2
client_recent_max_output_buffer:0
blocked_clients:0


# Stats
total_connections_received:1
total_commands_processed:3
instantaneous_ops_per_sec:0
total_net_input_bytes:88
total_net_output_bytes:3325
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0

# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=redis-cluster,status=sdown,address=192.168.1.48:6379,slaves=2,sentinels=3

(Edit)

- VikramRawat
- January 16, 2020 at 2:14 pm
- 0 votes
0
When this happens then can you connect to the slave machine and run the command info and verify the configuration.

Login or Signup to reply.

- user13424620
- April 30, 2020 at 6:38 pm
- 0 votes
0
I understand that this answer is a bit late, but, here are some insights from my experience

Here the issue lies with slave priority.

There are many conditions depending on which master election happens.
- SLAVE/REPLICA PRIORITY should be set for each redis node in redis.conf and needs to be a DIFFERENT VALUE for any node to become the master (lower the number higher the priority)
- There should always be odd number of nodes in the cluster
- The total number of active nodes should be MAJORITY
- The Quorum(threshold set in redis.conf which indicates number of nodes) should consent/agree to the master failure.
- THERE SHOULD ATLEAST BE ONE GOOD NODE(with no bad history of failing continuously etc) such that it can be agreed by everyone to become the master.
Once all the above conditions are met, then in the order of priorities nodes take up the role of master.

PS: There is some extra logic that redis puts into apart from the above that makes the decision more solid and failproof
Please find more info HERE IN OFFICIAL DOCUMENTATION
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.