skip to Main Content

I have this list with lots of JSON (here shortened) in it, which are very nested and I want to normalize it and bring it all to one level in the JSON.

What I have:

[{'index': 'exp-000005',
  'type': '_doc',
  'id': 'jdaksjdlkj',
  'score': 9.502488,
  'source': {'user': {'resource_uuid': '123'},
   'verb': 'REPLIED',
   'resource': {'resource_uuid': '/home'},
   'timestamp': '2022-01-20T08:14:00+00:00',
   'with_result': {},
   'in_context': {'screen_width': '3440',
    'screen_height': '1440',
    'build_version': '7235',
    'clientid': '123',
    'question': 'Hallo',
    'request_time': '403',
    'status': 'success',
    'response': '[]',
    'language': 'de'}}},
 {'index': 'exp-000005',
  'type': '_doc',
  'id': 'dddddd',
  'score': 9.502488,
  'source': {'user': {'resource_uuid': '44444'},
   'verb': 'REPLIED',
   'resource': {'resource_uuid': '/home'},
   'timestamp': '2022-01-20T08:14:10+00:00',
   'with_result': {},
   'in_context': {'screen_width': '3440',
    'screen_height': '1440',
    'build_version': '7235',
    'clientid': '345',
    'question': 'Ich brauche Hilfe',
    'request_time': '111',
    'status': 'success',
    'response': '[{"recipientid":"789", "text":"Bitte sehr."}, {"recipientid":"888", "text":"Kann ich Ihnen noch mit etwas anderem behilflich sein?"}]',
    'language': 'de'}}},
 {'index': 'exp-000005',
  'type': '_doc',
  'id': 'jdhdgs',
  'score': 9.502488,
  'source': {'user': {'resource_uuid': '333'},
   'verb': 'REPLIED',
   'resource': {'resource_uuid': '/home'},
   'timestamp': '2022-01-20T08:14:19+00:00',
   'with_result': {},
   'in_context': {'screen_width': '3440',
    'screen_height': '1440',
    'build_version': '7235',
    'clientid': '007',
    'question': 'Zertifikate',
    'request_time': '121',
    'status': 'success',
    'response': '[{"recipientid":"345", "text":"Künstliche Intelligenz"}, {"recipientid":"123", "text":"Kann ich Ihnen noch mit etwas anderem behilflich sein?"}]',
    'language': 'de'}}}]

What I want:

[{'index': 'exp-000005',
  'type': '_doc',
  'id': 'jdaksjdlkj',
  'score': 9.502488,
  'resource_uuid': '123',
  'verb': 'REPLIED',
  'resource_uuid': '/home',
  'timestamp': '2022-01-20T08:14:00+00:00',
  'with_result': {},
  'screen_width': '3440',
  'screen_height': '1440',
  'build_version': '7235',
  'clientid': '123',
  'question': 'Hallo',
  'request_time': '403',
  'status': 'success',
  'response': '[]',
  'language': 'de'},
 {'index': 'exp-000005',
  'type': '_doc',
  'id': 'dddddd',
  'score': 9.502488,
  'resource_uuid': '44444',
  'verb': 'REPLIED',
  'resource_uuid': '/home',
  'timestamp': '2022-01-20T08:14:10+00:00',
  'with_result': {},
  'screen_width': '3440',
  'screen_height': '1440',
  'build_version': '7235',
  'clientid': '345',
  'question': 'Ich brauche Hilfe',
  'request_time': '111',
  'status': 'success',
  'recipientid1': "789", 
  'text1': "Bitte sehr",
  'recipientid2': "888", 
  'text2': "Kann ich Ihnen noch mit etwas anderem behilflich sein?",
  'language': 'de'},
 {'index': 'exp-000005',
  'type': '_doc',
  'id': 'jdhdgs',
  'score': 9.502488,
  'resource_uuid': '333',
  'verb': 'REPLIED',
  'resource_uuid': '/home',
  'timestamp': '2022-01-20T08:14:19+00:00',
  'with_result': {},
  'screen_width': '3440',
  'screen_height': '1440',
  'build_version': '7235',
  'clientid': '007',
  'question': 'Zertifikate',
  'request_time': '121',
  'status': 'success',
  'recipientid1': "345", 
  'text1': "Künstliche Intelligenz",
  'recipientid1': "123", 
  'text2': "Kann ich Ihnen noch mit etwas anderem behilflich sein?",
  'language': 'de'}]

I am not sure whether to use JSON flatten or some magic with pandas normalize.

2

Answers


  1. if the sub objects/dictionaries do not have duplicated naming, you can just unpack it.

    Recursive function to unpack and return a clean object. Assuming you only go x layers you can code yourself. If unknown depth, then has to be recursive.

    Possibly link for explaining -> found via google search

    Login or Signup to reply.
  2. In Python, each key in a dictionary must be unique. If you try to add a new key to a dictionary that already exists, the new value will overwrite the existing value for that key. This means that duplicate keys cannot exist in a dictionary.

    Therefore, in the desired result you’re aiming for, having the key resource_uuid appear multiple times is not possible.

    Changing the structure of the items in the list forces you to iterate through each item in the list. You can refer to the following function to see if the returned result aligns with your intention.

    def flatten_dict(d, parent_key='', sep='_'):
        items = []
        for k, v in d.items():
            if isinstance(v, dict):
                items.extend(flatten_dict(v, sep='').items())
                continue
            if isinstance(v, list):
                new_key = parent_key + sep + k if parent_key else k
                for i, item in enumerate(v):
                    if isinstance(item, dict):
                        items.extend(flatten_dict(item, f"{new_key}{sep}{i}", sep=sep).items())
                        continue
                    items.append((f"{new_key}{sep}{i}", item))
                continue
            items.append((k, v))
        return dict(items)
    

    Try it out with your_list:

    flattened_list = []
    for d in your_list:
        flattened_list.append(flatten_dict(d))
    print(flattened_list)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search