skip to Main Content

What is the best practice for migrate data from a table (large data, around 100G), with a query to obtain a subset of data, to another table?

I tried with Spark and DSBulk but both have pre-requisite to use a csv file, and that’s not my scenario.

2

Answers


  1. Cassandra’s CQL does not have the equivalent of SQL’s INSERT INTO table SELECT ... command that copies data from one table into another as a single statement because this requires a full table scan that doesn’t scale for large datasets and/or large clusters.

    You will need to export the contents of the source table, do the relevant transformations to the data, then import it to the destination table.

    The simplest way to do this is to export the data to a CSV file using DSBulk, manually manipulate the CSV file so it only contains the data you want, then bulk-load the updated CSV into the new table using DSBulk.

    Since you don’t want to deal with CSV files, the only option you have is to write a Spark or Java application to read the data one-by-one, transform it then write the transformed data to the new table. Cheers!

    Login or Signup to reply.
  2. You can use the built-in point in time restore capability to copy one Amazon Keyspaces table into another.

    1. Turn on the point in time restore feature
    2. Create a new table using the most recent snapshot (upto a second’s granularity)

    See this link with more details – https://docs.aws.amazon.com/keyspaces/latest/devguide/PointInTimeRecovery.html

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search