skip to Main Content

I have read about automatic failover feature of AWS Elasticache Redis. The document tells me that the failover process need that I have at least 1 replica node (i.e. at least 2 total nodes) so that it can use the replica node to replace the failed primary node.

But I cannot find the details about what will happen if I have only 1 node and it fails. Is it re-created automatically or it needs a manual process to drop and re-create it?

I intend to create a Redis Group (Cluster Mode Disabled) with only 1 node in my test environment using following CloudFormation template.

    "ReplicationGroup": {
        "Type": "AWS::ElastiCache::ReplicationGroup",
        "Properties": {
            "ReplicationGroupId" : "my-redis",
            "ReplicationGroupDescription" : "My Redis",
            "NumCacheClusters": 1,
            "AutomaticFailoverEnabled": false,
            "CacheNodeType": "cache.t3.medium",
            "CacheParameterGroupName" : "default.redis5.0",
            "Engine": "redis",
            "EngineVersion" : "5.0.6",
            "Port": "6379",
            "AtRestEncryptionEnabled" : true,
            "TransitEncryptionEnabled" : true,
            "AuthToken" : {"Ref": "AuthToken"},
            "CacheSubnetGroupName": {"Ref": "SubnetGroup"},
            "SecurityGroupIds": [
                {"Ref": "RedisSecurityGroup"}
            ],
            "SnapshotRetentionLimit": 0,
            "MultiAZEnabled" : {"Fn::If": ["ConditionMultiAZEnabled", true, false]}
        }
    },

2

Answers


  1. The process would depend on the scenario.

    A single node is in an AZ, so if the AZ is having issues then your node will potentially be impacted with little you can do to mitigate it. You would need to create another node in another AZ if you wanted to restore access.

    If it is an underlying host failure (e.g. the rack loses power, physical server needs to reboot etc) then AWS will try to migrate it to another host in the same availability zone.

    Most managed services follow the same recovery process as EC2 hosts, because these are what the services are running on under the hood.

    Login or Signup to reply.
  2. We faced that issue before. While AWS tried to install an important security update we lost all the data(Service update SLA didn’t meet). It was a single node Elasticache instance. Here is the reply that contains all the details from AWS Support;

    As you said, I found there were event messages on the cluster and BytesUsedForCache was dropped to 0. When I investigated the redis node, I was able to see that health check from ElastiCache service was failed since hardware failure and the node ***** was replaced to healthy new node to recover the redis service. Due to the redis cluster ***** has only single node *****, data loss can happen whenever the node is failed like this case.

    To improve high availability to the redis cluster and keep your data in node failure case, you should make a replication group by adding at least a replica node to the cluster. Please read this link to understand replication group in detail. https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/Replication.html

    Replica node can be used for only read request, but data is always replicated from primary node to replica node. Also replica node can be promoted to new primary when primary is failed, and then you can protect your data. This link provides how to add replica node . https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/Replication.AddReadReplica.html

    Furthermore, you can also enable Multi-az with auto failover feature with replication group. It can failover primary node automatically when the primary node is failed. It can also jazz up High Aavailability of your redis cluster. https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/AutoFailover.html

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search