I have a DataFrame with a single column which is a struct type and contains an array.
users_tp_df.printSchema()
root
|-- x: struct (nullable = true)
| |-- ActiveDirectoryName: string (nullable = true)
| |-- AvailableFrom: string (nullable = true)
| |-- AvailableFutureAllocation: long (nullable = true)
| |-- AvailableFutureHours: double (nullable = true)
| |-- CreateDate: string (nullable = true)
| |-- CurrentAllocation: long (nullable = true)
| |-- CurrentAvailableHours: double (nullable = true)
| |-- CustomFields: array (nullable = true)
| | |-- element: struct (containsNull = true)
| | | |-- Name: string (nullable = true)
| | | |-- Type: string (nullable = true)
| | | |-- Value: string (nullable = true)
I’m trying to convert the CustomFields array column in 3 three columns:
- Country;
- isExternal;
- Service.
So for example, I’ve these values:
and the final dataframe output excepted for that row will be:
Can anyone please help me in achieving this?
Thank you!
2
Answers
Considering the mockup structure below, similar with the one from your example,
you can do it the sql way by using the inline function:
The result:
Mockup structure:
This would work:
Sample Input:
Json –
{'x': {'CurrentAvailableHours': 2, 'CustomFields': [{'Name': 'Country', 'Value': 'Italy'}, {'Name': 'Service', 'Value':'Dev'}]}}
Input Structure:
Output:
Output Structure (Id can be dropped):