I have this list with lots of JSON (here shortened) in it, which are very nested and I want to normalize it and bring it all to one level in the JSON.
What I have:
[{'index': 'exp-000005',
'type': '_doc',
'id': 'jdaksjdlkj',
'score': 9.502488,
'source': {'user': {'resource_uuid': '123'},
'verb': 'REPLIED',
'resource': {'resource_uuid': '/home'},
'timestamp': '2022-01-20T08:14:00+00:00',
'with_result': {},
'in_context': {'screen_width': '3440',
'screen_height': '1440',
'build_version': '7235',
'clientid': '123',
'question': 'Hallo',
'request_time': '403',
'status': 'success',
'response': '[]',
'language': 'de'}}},
{'index': 'exp-000005',
'type': '_doc',
'id': 'dddddd',
'score': 9.502488,
'source': {'user': {'resource_uuid': '44444'},
'verb': 'REPLIED',
'resource': {'resource_uuid': '/home'},
'timestamp': '2022-01-20T08:14:10+00:00',
'with_result': {},
'in_context': {'screen_width': '3440',
'screen_height': '1440',
'build_version': '7235',
'clientid': '345',
'question': 'Ich brauche Hilfe',
'request_time': '111',
'status': 'success',
'response': '[{"recipientid":"789", "text":"Bitte sehr."}, {"recipientid":"888", "text":"Kann ich Ihnen noch mit etwas anderem behilflich sein?"}]',
'language': 'de'}}},
{'index': 'exp-000005',
'type': '_doc',
'id': 'jdhdgs',
'score': 9.502488,
'source': {'user': {'resource_uuid': '333'},
'verb': 'REPLIED',
'resource': {'resource_uuid': '/home'},
'timestamp': '2022-01-20T08:14:19+00:00',
'with_result': {},
'in_context': {'screen_width': '3440',
'screen_height': '1440',
'build_version': '7235',
'clientid': '007',
'question': 'Zertifikate',
'request_time': '121',
'status': 'success',
'response': '[{"recipientid":"345", "text":"Künstliche Intelligenz"}, {"recipientid":"123", "text":"Kann ich Ihnen noch mit etwas anderem behilflich sein?"}]',
'language': 'de'}}}]
What I want:
[{'index': 'exp-000005',
'type': '_doc',
'id': 'jdaksjdlkj',
'score': 9.502488,
'resource_uuid': '123',
'verb': 'REPLIED',
'resource_uuid': '/home',
'timestamp': '2022-01-20T08:14:00+00:00',
'with_result': {},
'screen_width': '3440',
'screen_height': '1440',
'build_version': '7235',
'clientid': '123',
'question': 'Hallo',
'request_time': '403',
'status': 'success',
'response': '[]',
'language': 'de'},
{'index': 'exp-000005',
'type': '_doc',
'id': 'dddddd',
'score': 9.502488,
'resource_uuid': '44444',
'verb': 'REPLIED',
'resource_uuid': '/home',
'timestamp': '2022-01-20T08:14:10+00:00',
'with_result': {},
'screen_width': '3440',
'screen_height': '1440',
'build_version': '7235',
'clientid': '345',
'question': 'Ich brauche Hilfe',
'request_time': '111',
'status': 'success',
'recipientid1': "789",
'text1': "Bitte sehr",
'recipientid2': "888",
'text2': "Kann ich Ihnen noch mit etwas anderem behilflich sein?",
'language': 'de'},
{'index': 'exp-000005',
'type': '_doc',
'id': 'jdhdgs',
'score': 9.502488,
'resource_uuid': '333',
'verb': 'REPLIED',
'resource_uuid': '/home',
'timestamp': '2022-01-20T08:14:19+00:00',
'with_result': {},
'screen_width': '3440',
'screen_height': '1440',
'build_version': '7235',
'clientid': '007',
'question': 'Zertifikate',
'request_time': '121',
'status': 'success',
'recipientid1': "345",
'text1': "Künstliche Intelligenz",
'recipientid1': "123",
'text2': "Kann ich Ihnen noch mit etwas anderem behilflich sein?",
'language': 'de'}]
I am not sure whether to use JSON flatten or some magic with pandas normalize.
2
Answers
if the sub objects/dictionaries do not have duplicated naming, you can just unpack it.
Recursive function to unpack and return a clean object. Assuming you only go x layers you can code yourself. If unknown depth, then has to be recursive.
Possibly link for explaining -> found via google search
In Python, each key in a dictionary must be unique. If you try to add a new key to a dictionary that already exists, the new value will overwrite the existing value for that key. This means that duplicate keys cannot exist in a dictionary.
Therefore, in the desired result you’re aiming for, having the key
resource_uuid
appear multiple times is not possible.Changing the structure of the items in the list forces you to iterate through each item in the list. You can refer to the following function to see if the returned result aligns with your intention.
Try it out with
your_list
: