skip to Main Content

DynamoDB Table:

+------------------+---------+---------------------+
|        id        |Column A |             Column B|
+------------------+---------+---------------------+
|1                 |  "155.2"|                 400 |
|2                 |      100|                 200 |
|3                 |  "455.2"|               305.5 |
|4                 |  "312.3"|                 350 |
+------------------+---------+---------------------+

Notice that in Column A. We have numbers stored as strings except for id = 2.

Following code is used to read the table contents into a Dynamic Frame:

def create_dynamic_frame(table_name):
    ddb_s3_bucket = <some-s3-bucket>
    ddb_table_arn = <some-table-arn>
    connection_options = {
        "dynamodb.export": "ddb",
        "dynamodb.unnestDDBJson": True,
        "dynamodb.tableArn": ddb_table_arn,
        "dynamodb.s3.bucket": ddb_s3_bucket,
        "dynamodb.s3.prefix": 'temporary/ddbexport/'
    }
    dynamic_frame = glueContext.create_dynamic_frame.from_options(
        connection_type="dynamodb",
        connection_options=connection_options,
        transformation_ctx="dynamic_frame",
    )
    return dynamic_frame

dyf = create_dynamic_frame('test-table')

The output of show on the created Dynamic Frame: dyf.toDf().show()

+------------------+---------+---------------------+
|        id        |Column A |             Column B|
+------------------+---------+---------------------+
|1                 |     null|                 400 |
|2                 |     100 |                 200 |
|3                 |     null|               305.5 |
|4                 |     null|                 350 |
+------------------+---------+---------------------+

The output of dyf.toDf().printSchema():

root
 |-- id: string (nullable = true)
 |-- Column A: string (nullable = true)
 |-- Column B: string (nullable = true)

Notice that the string values in Column A are null. I was under the impression that Glue keeps both types in the column and you can use resolveChoice to then cast to whichever type you would want.

Is there a way I can resolve the type in the Glue-DDB connector?

I tried to resolve the types using resolveChoice:

resolved_dyf = dyf.resolveChoice(specs = [("Column A", "cast:string")])

This did not work, since the values in dyf itself are null

2

Answers


  1. Chosen as BEST ANSWER

    This seems to be a limitation of the "connectionType": "dynamodb" with the AWS Glue DynamoDB export connector as source

    Moreover, if we use unnestDDBJson parameter, Glue is forced to evaluate schema for the columns. If I do not use the unnestDDBJson parameter, all column values are kept as struct and I could not find a resource to then resolve the type.

    One workaround for this is using the "connectionType": "dynamodb" with the ETL connector as source


  2. When using "unnestDDBJson" it’s forced to resolve and flatten the schema, I think you are will need to avoid that in your case and do it yourself after you have resolved the type.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search