What are the differences among the different disaster recovery options for databases? - Amazon Web Sevices

ChongLipPhang
March 3, 2023
334 views
2 votes
3 Answers

In the context of AWS databases, how do the following disaster recovery strategies differ from one another:

point-in-time recovery
backup
snapshot
Aurora backtrack

When should we choose one over the others?

Why do we need so many different options when one will suffice?

Should we try to use all of them?

Answers

Chosen as BEST ANSWER
- ChongLipPhang
- March 6, 2023 at 10:26 am
- 0 votes
0
One key difference between a manual snapshot and an automatic backup is that a snapshot doesn't expire, whereas an automatic backup are usually stored for a maximum of 35 days.

When you enable automated backups for your AWS database, AWS takes periodic backups of your database and stores them in Amazon S3. These backups serve as the starting point for PITR. AWS keeps transaction logs in S3 for up to 35 days, allowing you to perform point-in-time recovery (PITR) to any point within that timeframe.

When you initiate a PITR restore operation, AWS uses the selected backup and the transaction log to restore your database to the desired point in time. AWS first restores the backup and then applies the relevant transactions from the transaction log to the restored backup. This process brings the database to the desired point in time, allowing you to recover your data as it existed at that time.

Aurora Backtrack allows you to easily undo unintended or incorrect changes to your database by rolling back the database to a specific point in time without needing to restore from a backup. This allows fast rollbacks without the need to create a new database instance. However, Aurora Backtrack has a maximum backtrack window of 72 hours, which means you can only roll back your database to any point in time within the last 72 hours. This is because Aurora Backtrack uses the transaction log to roll back changes, and transaction logs are only kept for 72 hours.

(Edit)

- JohnRotenstein
- March 4, 2023 at 12:57 am
- 0 votes
0
‘Disaster Recovery’ is very old-world. It implies having to fail-over when a problem happens. In the cloud, however, you can focus on High Availability so that systems can recover automatically when there is a failure, without the need to ‘fail-back’ to the original system.

Therefore, the best option is do not do disaster recovery.

Instead, take advantage of the cloud-first design of Amazon Aurora, which automatically replicates data between multiple Availability Zones (each being a different data center).

From High availability for Amazon Aurora – Amazon Aurora:

Aurora stores copies of the data in a DB cluster across multiple Availability Zones in a single AWS Region. Aurora stores these copies regardless of whether the instances in the DB cluster span multiple Availability Zones.

When data is written to the primary DB instance, Aurora synchronously replicates the data across Availability Zones to six storage nodes associated with your cluster volume. Doing so provides data redundancy, eliminates I/O freezes, and minimizes latency spikes during system backups. Running a DB instance with high availability can enhance availability during planned system maintenance, and help protect your databases against failure and Availability Zone disruption.

If you want to use a traditional database instead (eg SQL Server), you can use Amazon RDS to run a Multi-AZ Database. This consists of two databases servers in the same Region but in different Availability Zones (which means different data centers):
- A Primary server in one AZ that is serving traffic
- A Secondary server in a different AZ (in the same Region) that is being continuously updated by the Primary server
If a failure happens with the Primary server, the Secondary server becomes the new Primary server. There is a brief outage, but no data is lost. The RDS service will then launch a new Secondary server.

Failure recovery vs Data recovery

The other options you mention (point-in-time recovery, snapshots) are focussed on recovering data that was in the database at a particular time. This is normally because somebody/something accidentally deleted or changed data and you wish to recovery the data as it was at a previous time. It is good to combine both High Availability and Snapshots, although Amazon Aurora almost makes Snapshots irrelevant due to its ability to go back to a previous point in time.

Bottom line: Instead of Disaster Recovery, think High Availability.
Login or Signup to reply.

- AsimIhsan
- March 4, 2023 at 1:34 am
- 0 votes
0
First of all, you need to identify the Recovery Time Objective (RTO) and
Recovery Point Objective (RPO) for
your workload. RTO is the amount of time from a disaster event to when your
system must be fully operational again. RPO is the maximum amount of data loss
that you can tolerate after a disaster event. These objectives help you
determine the appropriate level of risk and cost for your disaster recovery (DR)
plan.

According to AWS
documentation,
there are four main DR strategies that you can use on AWS:
1. Backup and restore – back up your systems and restore them from backup if
  disaster strikes. This is low-cost but high-risk, as it has a high RTO and
  RPO.
2. Pilot light – replicate your data and core elements to another Region and
  scale up when needed. This reduces the RTO and RPO but requires some manual
  intervention.
3. Warm standby – run a scaled-down version of your system in another Region
  that can handle minimal traffic. This allows you to switch over quickly with
  minimal downtime. This further reduces the RTO and RPO but increases the cost
  and complexity.
4. Multi-site active/active – run your system across multiple Regions with
  load balancing and synchronization. This provides the highest availability
  and resilience, as well as the lowest RTO and RPO possible. However, this
  also requires the most cost and complexity.
Your question only focuses on different backup and restore strategies. They are
all different ways of restoring your database state from a specific point in
time using AWS services such as Amazon Relational Database Service (RDS), Amazon
Aurora, or Amazon DynamoDB.

However, these options do not cover other aspects of DR such as scaling up
resources, switching over traffic, or synchronizing data across Regions. Some
services like AWS Aurora natively support multi-site active/active DR, but
others like RDS do not. Therefore, you need to first focus on the RTO and RPO
objectives for your workload before choosing a DR strategy. Also please refer to
Disaster Recovery on AWS.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

What are the differences among the different disaster recovery options for databases? – Amazon Web Sevices

Answers