I have JSON data as follows:
json_data = [
{
"meta_info": {"id": 1, "name": "Alice"},
"data": [{"value": 100}, {"value": 200}]
},
{
"meta_info": {"id": 2, "name": "Bob"},
"data": []
},
{
"meta_info": {"id": 3, "name": "Charlie"}
# "data" key is missing
}
]
When I try to read it into a pd.DataFrame
with pd.json_normalize
:
df = pd.json_normalize(json_data, record_path='data', meta=['meta_info.id', 'meta_info.name'], errors='ignore')
I am getting the error:
KeyError: "Key 'data' not found. If specifying a record_path, all elements of data should have the path.
Based on json_data
input above, my desired output as a dataframe looks like this:
value | meta_info.id | meta_info.name |
---|---|---|
100.0 | 1 | Alice |
200.0 | 1 | Alice |
NaN | 2 | Bob |
NaN | 3 | Charlie |
2
Answers
For this, you should iterate through the data in python (
json_normalize
is not flexible enough to cover this):One approach could be to correct
json_data
before passing it topd.json_normalize
. Showing 2 options depending on python version (>= 3.5, >= 3.9):Python 3.5
Since python 3.5 (examples at PEP 448), we can update via dictionary unpacking (
**
):Output:
Explanation
The value for our "update" dict (i.e.,
**{'data': ..}
we determine as follows:d
), we trydict.get
and default to[]
, thus aligning dicts without ‘data’ key (your 3rd dict) with the dicts that have{'data': []}
(your 2nd dict).[]
, if so, we overwrite with[{'value': np.nan}]
(now applicable to both 2nd and 3rd dict), else we just keep the value as is (1st dict).{**d, **{'data': ...}}
is that the later values (i.e. the right dict) will update the left values (the left dict). Doing this inside a for loop, for eachd
injson_data
, we get the intermediate result:pd.json_normalize
. Make sure to addmeta
fields correctly. Not:meta=['meta_info.id', ...]
, but:meta=[['meta_info','id'], ...]
.Python 3.9
Since python 3.9 (examples at PEP 584) we can create
data
using the dict merge operator (|
):d
ifd.get('data', []) != []
(1st dict), else we updated
with{'data': [{'value': np.nan}]}
viad | {'data': [{'value': np.nan}]}
(2nd and 3rd dict).