skip to Main Content

In the context of AWS databases, how do the following disaster recovery strategies differ from one another:

  • point-in-time recovery
  • backup
  • snapshot
  • Aurora backtrack

When should we choose one over the others?

Why do we need so many different options when one will suffice?

Should we try to use all of them?

3

Answers


  1. Chosen as BEST ANSWER

    One key difference between a manual snapshot and an automatic backup is that a snapshot doesn't expire, whereas an automatic backup are usually stored for a maximum of 35 days.

    When you enable automated backups for your AWS database, AWS takes periodic backups of your database and stores them in Amazon S3. These backups serve as the starting point for PITR. AWS keeps transaction logs in S3 for up to 35 days, allowing you to perform point-in-time recovery (PITR) to any point within that timeframe.

    When you initiate a PITR restore operation, AWS uses the selected backup and the transaction log to restore your database to the desired point in time. AWS first restores the backup and then applies the relevant transactions from the transaction log to the restored backup. This process brings the database to the desired point in time, allowing you to recover your data as it existed at that time.

    Aurora Backtrack allows you to easily undo unintended or incorrect changes to your database by rolling back the database to a specific point in time without needing to restore from a backup. This allows fast rollbacks without the need to create a new database instance. However, Aurora Backtrack has a maximum backtrack window of 72 hours, which means you can only roll back your database to any point in time within the last 72 hours. This is because Aurora Backtrack uses the transaction log to roll back changes, and transaction logs are only kept for 72 hours.


  2. ‘Disaster Recovery’ is very old-world. It implies having to fail-over when a problem happens. In the cloud, however, you can focus on High Availability so that systems can recover automatically when there is a failure, without the need to ‘fail-back’ to the original system.

    Therefore, the best option is do not do disaster recovery.

    Instead, take advantage of the cloud-first design of Amazon Aurora, which automatically replicates data between multiple Availability Zones (each being a different data center).

    From High availability for Amazon Aurora – Amazon Aurora:

    Aurora stores copies of the data in a DB cluster across multiple Availability Zones in a single AWS Region. Aurora stores these copies regardless of whether the instances in the DB cluster span multiple Availability Zones.

    When data is written to the primary DB instance, Aurora synchronously replicates the data across Availability Zones to six storage nodes associated with your cluster volume. Doing so provides data redundancy, eliminates I/O freezes, and minimizes latency spikes during system backups. Running a DB instance with high availability can enhance availability during planned system maintenance, and help protect your databases against failure and Availability Zone disruption.

    If you want to use a traditional database instead (eg SQL Server), you can use Amazon RDS to run a Multi-AZ Database. This consists of two databases servers in the same Region but in different Availability Zones (which means different data centers):

    • A Primary server in one AZ that is serving traffic
    • A Secondary server in a different AZ (in the same Region) that is being continuously updated by the Primary server

    If a failure happens with the Primary server, the Secondary server becomes the new Primary server. There is a brief outage, but no data is lost. The RDS service will then launch a new Secondary server.

    Failure recovery vs Data recovery

    The other options you mention (point-in-time recovery, snapshots) are focussed on recovering data that was in the database at a particular time. This is normally because somebody/something accidentally deleted or changed data and you wish to recovery the data as it was at a previous time. It is good to combine both High Availability and Snapshots, although Amazon Aurora almost makes Snapshots irrelevant due to its ability to go back to a previous point in time.

    Bottom line: Instead of Disaster Recovery, think High Availability.

    Login or Signup to reply.
  3. First of all, you need to identify the Recovery Time Objective (RTO) and
    Recovery Point Objective (RPO)
    for
    your workload. RTO is the amount of time from a disaster event to when your
    system must be fully operational again. RPO is the maximum amount of data loss
    that you can tolerate after a disaster event. These objectives help you
    determine the appropriate level of risk and cost for your disaster recovery (DR)
    plan.

    According to AWS
    documentation
    ,
    there are four main DR strategies that you can use on AWS:

    1. Backup and restore – back up your systems and restore them from backup if
      disaster strikes. This is low-cost but high-risk, as it has a high RTO and
      RPO.
    2. Pilot light – replicate your data and core elements to another Region and
      scale up when needed. This reduces the RTO and RPO but requires some manual
      intervention.
    3. Warm standby – run a scaled-down version of your system in another Region
      that can handle minimal traffic. This allows you to switch over quickly with
      minimal downtime. This further reduces the RTO and RPO but increases the cost
      and complexity.
    4. Multi-site active/active – run your system across multiple Regions with
      load balancing and synchronization. This provides the highest availability
      and resilience, as well as the lowest RTO and RPO possible. However, this
      also requires the most cost and complexity.

    Your question only focuses on different backup and restore strategies. They are
    all different ways of restoring your database state from a specific point in
    time using AWS services such as Amazon Relational Database Service (RDS), Amazon
    Aurora, or Amazon DynamoDB.

    However, these options do not cover other aspects of DR such as scaling up
    resources, switching over traffic, or synchronizing data across Regions. Some
    services like AWS Aurora natively support multi-site active/active DR, but
    others like RDS do not. Therefore, you need to first focus on the RTO and RPO
    objectives for your workload before choosing a DR strategy. Also please refer to
    Disaster Recovery on AWS.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search