i have a three node redis and 3 node sentinel, everything is ok and all master and slaves are verified and sentinel config file is updated with all redis and sentinel nodes, but the problem is when redis master is down and sentinel wants to elect the failed master again and wont elect between other slaves to choose new master,here is my config files and logs.
vm1: redis master and sentinel1 192.168.1.48
vm2: redis slave and sentinel2 192.168.1.51
vm3: redis slave and sentinel3 192.168.1.52
Redis master conf: (vm1)
bind 192.168.1.48 127.0.0.1
protected-mode yes
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 60
daemonize no
supervised systemd
pidfile /var/run/redis_6379.pid
loglevel notice
logfile ""
databases 16
always-show-logo yes
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /var/lib/redis
replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
replica-priority 100
requirepass 123456789
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4096
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes
one of sentinel configs: (vm1)
#1
bind 0.0.0.0
port 26379
#2
sentinel myid 7e09f70bc68cdc0afee3d8cd9bdf3fe6f320a3d5
sentinel deny-scripts-reconfig yes
sentinel monitor redis-cluster 192.168.1.48 6379 3
sentinel down-after-milliseconds redis-cluster 5000
#3
sentinel failover-timeout redis-cluster 1000
sentinel parallel-syncs redis-cluster 1
#misc
daemonize yes
pidfile "/var/run/redis_26379.pid"
logfile "/var/log/redis_26379.log"
dir "/var/lib/redis"
##############
# Generated by CONFIG REWRITE
protected-mode no
sentinel auth-pass redis-cluster 123456789
sentinel config-epoch redis-cluster 0
sentinel leader-epoch redis-cluster 52
sentinel known-replica redis-cluster 192.168.1.51 6379
sentinel known-replica redis-cluster 192.168.1.52 6379
sentinel known-sentinel redis-cluster 192.168.1.52 26379 0b37aa7287e89ad38a90a97cdff16c22793678a6
sentinel known-sentinel redis-cluster 192.168.1.51 26379 9d097bb22ffdf87c7f8a403a8dc82c989790cf3b
sentinel current-epoch 52
another Sentinel config: (vm2)
#1
bind 0.0.0.0
port 26379
#2
sentinel myid 9d097bb22ffdf87c7f8a403a8dc82c989790cf3b
daemonize yes
pidfile "/var/run/redis_26379.pid"
sentinel deny-scripts-reconfig yes
sentinel monitor redis-cluster 192.168.1.48 6379 2
sentinel down-after-milliseconds redis-cluster 5000
sentinel failover-timeout redis-cluster 10000
#3
sentinel auth-pass redis-cluster 123456789
#misc
logfile "/var/log/redis_26379.log"
dir "/var/lib/redis"
# Generated by CONFIG REWRITE
protected-mode no
sentinel config-epoch redis-cluster 0
sentinel leader-epoch redis-cluster 28811
sentinel known-replica redis-cluster 192.168.1.51 6379
sentinel known-replica redis-cluster 192.168.1.52 6379
sentinel known-sentinel redis-cluster 192.168.1.52 26379 0b37aa7287e89ad38a90a97cdff16c22793678a6
sentinel known-sentinel redis-cluster 192.168.1.48 26379 7e09f70bc68cdc0afee3d8cd9bdf3fe6f320a3d5
sentinel current-epoch 28811
Sentinel log after master failure: (vm2)
2692:X 15 Jan 2020 09:19:42.576 # +vote-for-leader 0b37aa7287e89ad38a90a97cdff16c22793678a6 28804
2692:X 15 Jan 2020 09:19:42.582 # Next failover delay: I will not start a failover before Wed Jan 15 09:20:02 2020
2692:X 15 Jan 2020 09:20:02.659 # +new-epoch 28805
2692:X 15 Jan 2020 09:20:02.660 # +try-failover master redis-cluster 192.168.1.48 6379
2692:X 15 Jan 2020 09:20:02.662 # +vote-for-leader 9d097bb22ffdf87c7f8a403a8dc82c989790cf3b 28805
2692:X 15 Jan 2020 09:20:02.674 # 0b37aa7287e89ad38a90a97cdff16c22793678a6 voted for 9d097bb22ffdf87c7f8a403a8dc82c989790cf3b 28805
2692:X 15 Jan 2020 09:20:02.745 # +elected-leader master redis-cluster 192.168.1.48 6379
2692:X 15 Jan 2020 09:20:02.745 # +failover-state-select-slave master redis-cluster 192.168.1.48 6379
2692:X 15 Jan 2020 09:20:02.846 # -failover-abort-no-good-slave master redis-cluster 192.168.1.48 6379
2692:X 15 Jan 2020 09:20:02.902 # Next failover delay: I will not start a failover before Wed Jan 15 09:20:22 2020
3
Answers
Thanks for your answer, here is important output from info command in vm3:
And here is redis-sentinel important output in vm3:
When this happens then can you connect to the slave machine and run the command info and verify the configuration.
I understand that this answer is a bit late, but, here are some insights from my experience
Here the issue lies with slave priority.
There are many conditions depending on which master election happens.
SLAVE/REPLICA PRIORITY should be set for each redis node in redis.conf and needs to be a DIFFERENT VALUE for any node to become the master (lower the number higher the priority)
There should always be odd number of nodes in the cluster
Once all the above conditions are met, then in the order of priorities nodes take up the role of master.
PS: There is some extra logic that redis puts into apart from the above that makes the decision more solid and failproof
Please find more info HERE IN OFFICIAL DOCUMENTATION