skip to Main Content

I am trying to migrate Cassandra cluster onto AWS Keyspaces for Apache Cassandra.
After the migration is done how can I verify that the data has been migrated successfully as-is?

2

Answers


  1. Many solutions are possible, you could simply read all rows of a partition and compute a checksum / signature and compare with your original data for instance.Then iterating through all your partitions, then doing it for all your tables. Checksums work.

    Login or Signup to reply.
  2. You could use AWS Glue to perform an ‘except’ function. Spark has a lot of usefull functions for working with massive datasets. Glue is serverless spark. You can use the spark cassandra connector with Cassandra and Keyspaces to work with datasets in glue. For example you may want to see the data that is not in Keyspaces.

    cassandraTableDataframe.except(keyspacesTableDateframe). 
    
    

    You could also do this by exporting both datasets to s3 and performing these queries in Athena.

    Here is a helpful repository of Glue and Keyspaces functions including export, count, and distinct.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search