I have been able to write to console the json file I want to work on to console. Please, how do I separate the ‘value’ column into columns of data as in the json and write to delta lake for sql query and MLlib? Thanks.
{"coord": {"lon": -1.15, "lat": 52.95}, "list": [{"main": {"aqi": 2}, "components": {"co": 220.3, "no": 0.26, "no2": 5.14, "o3": 75.1, "so2": 1.54, "pm2_5": 1.8, "pm10": 2.71, "nh3": 2.79}, "dt": 1679418000}, {"main": {"aqi": 2}, "components": {"co": 220.3, "no": 0.07, "no2": 7.45, "o3": 72.24, "so2": 2.18, "pm2_5": 1.9, "pm10": 2.9, "nh3": 3.45}, "dt": 1679421600}}
2
Answers
I defined an arraytype struct schema for the json value I want to explode;
Then create a data frame with;
I extracted value of the explode column into different columns with;
Use
get_json_object
for each field you want, ex.For the
list
field, you need toexplode