I have json files in this generic format:
{"attribute1": "test1",
"attribute2": "test2",
"data": {
"0":
{"metadata": {
"timestamp": "2022-08-14"},
"detections": {
"0": {"dim1": 40, "dim2": 30},
"1": {"dim1": 50, "dim2": 20}}},
"1":
{"metadata": {
"timestamp": "2022-08-15"},
"detections": {
"0": {"dim1": 30, "dim2": 10},
"1": {"dim1": 100, "dim2": 80}}}}}
These json files refer to the collection of measurements through a 3D camera. The upper levels in the key data
correspond to frames and each frame has its own metadata
and can have multiple detections
objects, each object with its own dimensions (here represented by dim1
and dim2
). I want to convert this type of json file to a pandas
DataFrame in the following format:
timestamp | dim1 | dim2 |
---|---|---|
2022-08-14 | 40 | 30 |
2022-08-14 | 50 | 20 |
2022-08-15 | 30 | 10 |
2022-08-15 | 100 | 80 |
So, any fields in metadata
(here I only added timestamp
but there could be several) must be repeated for each entry in the detection
key.
I can convert this type of json to a pandas
DataFrame, but it requires multiple steps and for loops within a single file to concatenate everything at the end. I have also tried pd.json_normalize
and playing with the arguments record_path
, meta
and max_level
but so far I was not able to, in a few steps, convert this type of json to a DataFrame. Is there a clean way to do this?
2
Answers
I think a good solution could be:
Use nested dictioanry comprehension for flatten
values
and merge subdictionaries, last pass to DataFrame constructor: