skip to Main Content

I have a csv sitting in an S3 Bucket about 900,000 rows long, and within that csv I have two columns phone and ttl.

I am able to successfully import this csv into a new DynamoDB table, however I am NOT able to classify what type of object each column should be (which in the case of the ttl column that is classified as a string rather than number).

In the csv file itself the ttl values are not surrounded by quotation marks–just purely a number that’s being misinterpreted.

2

Answers


  1. When using the import from S3 feature, you can only specify the types of the partition key and sort key attributes. All other attributes default to string.

    As a workaround, you could convert the CSV objects into DDB-JSON or Ion. These data types support non-string types and can be used as a source format for an import.

    Login or Signup to reply.
  2. Unfortunately while using CSV files and DynamoDB’s import from S3 feature it imports non-key attributes as string type:

    When importing from CSV files, all columns other than the hash range and keys of your base table and secondary indexes are imported as DynamoDB strings.

    Src

    It’s best to convert the data to DDB-JSON first so it maintains its correctness. Depending on how much data you have you can either use

    1. AWS Glue (large data)
    2. AWS Lambda (small data)

    When doing this, it’s useful to leverage some of the SDKs serialisation libraries, such as pythons:

    https://boto3.amazonaws.com/v1/documentation/api/latest/_modules/boto3/dynamodb/types.html

    The python one is super useful as you can use it in both Lambda and Spark.

    from boto3.dynamodb.types import TypeDeserializer, TypeSerializer
    
    item_to_store = {
        'user_id': '12345',
        'first_name': 'Terry',
        'age': 48,
    }
    
    serializer = TypeSerializer()
    
    serialized_item = serializer.serialize(item_to_store)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search