I am trying to migrate Cassandra cluster onto AWS Keyspaces for Apache Cassandra.
After the migration is done how can I verify that the data has been migrated successfully as-is?
Question posted in Amazon Web Sevices
The official Amazon Web Services documentation can be found here.
The official Amazon Web Services documentation can be found here.
2
Answers
Many solutions are possible, you could simply read all rows of a partition and compute a checksum / signature and compare with your original data for instance.Then iterating through all your partitions, then doing it for all your tables. Checksums work.
You could use AWS Glue to perform an ‘except’ function. Spark has a lot of usefull functions for working with massive datasets. Glue is serverless spark. You can use the spark cassandra connector with Cassandra and Keyspaces to work with datasets in glue. For example you may want to see the data that is not in Keyspaces.
You could also do this by exporting both datasets to s3 and performing these queries in Athena.
Here is a helpful repository of Glue and Keyspaces functions including export, count, and distinct.