I need to flatten JSON file so that I can get output in table format.Ihavetried but not getting the output that I want
This is my JSON file :-
{
"records": [
{
"name": "A",
"last_name": "B",
"special_values": [
{
"name": "address",
"value": "some adress"
},
{
"name": "city",
"value": "Chd"
},
{
"name": "zip_code",
"value": "160036"
}
]
},
{
"name": "X",
"last_name": "Y",
"special_values": [
{
"name": "adress",
"value": "some adress"
},
{
"name": "city",
"value": "Dallas"
},
{
"name": "zip_code",
"value": "02431"
}
]
}
]
}
I want this output:-
|name|last_name|address|city |zip_code|FIELD7|
|----|---------|-------|------|--------|------|
|A |B |some |adress|chd |160036|
|X |Y |some |adress|Dallas |02431 |
I have tried this code but getting different output :-
df = spark.read.option("multiline","true").json(r"C:UsersLajoDownloadsspark_ex2_input.json")
from pyspark.sql.types import *
from pyspark.sql.functions import explode_outer,col
def flatten(df):
# compute Complex Fields (Lists and Structs) in Schema
complex_fields = dict([(field.name, field.dataType)
for field in df.schema.fields
if type(field.dataType) == ArrayType or type(field.dataType) == StructType])
while len(complex_fields)!=0:
col_name=list(complex_fields.keys())[0]
print ("Processing :"+col_name+" Type : "+str(type(complex_fields[col_name])))
# if StructType then convert all sub element to columns.
# i.e. flatten structs
if (type(complex_fields[col_name]) == StructType):
expanded = [col(col_name+'.'+k).alias(col_name+'_'+k) for k in [ n.name for n in complex_fields[col_name]]]
df=df.select("*", *expanded).drop(col_name)
# if ArrayType then add the Array Elements as Rows using the explode function
# i.e. explode Arrays
elif (type(complex_fields[col_name]) == ArrayType):
df=df.withColumn(col_name,explode_outer(col_name))
# recompute remaining Complex Fields in Schema
complex_fields = dict([(field.name, field.dataType)
for field in df.schema.fields
if type(field.dataType) == ArrayType or type(field.dataType) == StructType])
return df
df_flatten = flatten(df)
df_flatten.show()
2
Answers
in my opinion it is much easier with the library ‘Pandas‘ and less code is needed.
the result looks like this
One option is to flatten the data before making it into a data frame. Consider reading the JSON file with the built-in
json
library. Then you can perform the following operation on the resultingdata
object.Then you can create a data frame from the
data
object.