skip to Main Content

I have this schema in the AWS Glue job:

root
    |-- SortedLenders: array
    |    |-- element: struct
    |    |    |-- LenderID: string
    |    |    |-- MaxProfit: string
    |-- FilteredOutDecisions: array
    |    |-- element: struct
    |    |    |-- ApprovedAmount: string
    |    |    |-- Reasons: array
    |    |    |    |-- element: int

I can cast a string of FilteredOutDecisions.ApprovedAmount to double using resolveChoice() method:

test.resolveChoice(specs=[('FilteredOutDecisions[].ApprovedAmount', 'cast:double')])

But I am wondering how to cast FilteredOutDecisions.Reasons to string. Could anyone help me with this out?
Thanks, in advance!

2

Answers


  1. Chosen as BEST ANSWER

    I found the solution and it worked: I converted Glue dynamicFrame to Spark dataFrame: df=dyf.toDF(), and applied the below the code to a converted dataframe:

    import pyspark.sql from functions as F
    import pyspark.sql.types from StructType, StructField, ArrayType, StringType
    
    df= df.withColumn('FilteredOutDecisions', F.col('FilteredOutDecisions').cast(ArrayType(
            StructType([
                StructField("ApprovalStatus", StringType()),
                StructField("Reasons", ArrayType(StringType()))
            ]))))
    

  2. I would recommend you to use Spark SQL StructType & StructField classes. Instead, glue purpose build transformations. which makes us to define schema to the DF and creating complex nested schema, enables casting to a particular type.

    Convert your glue dynamic frame to spark df, then, do use above mentioned classes.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search