We have a dynamodb database and would like to export the data to a new redshift database staging table on a nightly basis. Ideally I think it be better if we only export inserts or updates since last load otherwise we would be exporting a lot of data each night. Just wondering the best way to go about this. At present we export the data to a postgres database using fivetran etl.
I have looked into using glue etl to write the data to s3 or straight to redshift but I do not see an option with the glue etl to only pick last days data. Also the table itself does not have a field called last_update_date but I was wondering if that info is stored somewhere to use as I see when I click on incremental export within the dynamodb table I can select the time period.
2
Answers
Zero ETL to Redshift
Your best option is to use the newly announced Zero ETL to Redshift feature which is currently in preview:
https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-dynamodb-zero-etl-integration-redshift/
Incremental Export
The next best option is to use incremental export. This exports the change set of data which includes timestamps of when the items were modified. It also only holds the final image of an item, it does not contain all changes to a single item over the time period selected.
https://aws.amazon.com/about-aws/whats-new/2023/09/incremental-export-s3-amazon-dynamodb/
I think Peaka’s Zero-ETL model can be very useful here as well. They seem to have both Redshift and DynamoDB available as an integration.
https://www.peaka.com/integrations/