skip to Main Content

I have a JSON object where there are duplicate keys. I want to merge all the values as a string under one key using a ";" separator

input:

{
"1061": "GROCERY",
"1073": "GM-HBC",
"4220": "PRODUCE",
"958": "MEAT",
"958": "DAIRY",
"958": "FROZEN"
}

desired output:

{
"1061": "GROCERY",
"1073": "GM-HBC",
"4220": "PRODUCE",
"958": "MEAT;DAIRY;FROZEN"
}

I copied the question and data from this (link) since this is exactly what I’m looking for, except I want to do this in python.

3

Answers


  1. Here’s a solution using itertools.groupby and json.loads(..., object_pairs_hook=...):

    from itertools import groupby
    import json
    
    def merge_duplicates(pairs):
        for key, duplicate_pairs in groupby(sorted(pairs), lambda x: x[0]):
            yield key, ';'.join(value for _, value in duplicate_pairs)
    
    def parse_with_duplicates(text):
        return json.loads(text, object_pairs_hook=lambda pairs: dict(merge_duplicates(pairs)))
    
    print(parse_with_duplicates("""
    {
    "1061": "GROCERY",
    "1073": "GM-HBC",
    "4220": "PRODUCE",
    "958": "MEAT",
    "958": "DAIRY",
    "958": "FROZEN"
    }
    """))
    # {'1061': 'GROCERY', '1073': 'GM-HBC', '4220': 'PRODUCE', '958': 'DAIRY;FROZEN;MEAT'}
    
    Login or Signup to reply.
  2. You can use the multidict library to create a MultiDict, which is like a regular dictionary but can hold multiple values for the same key.

    import multidict
    
    data = multidict.MultiDict([
       ("1061", "GROCERY"),
       ("1073", "GM-HBC"),
       ("4220", "PRODUCE"),
       ("958", "MEAT"),
       ("958", "DAIRY"),
       ("958", "FROZEN")
    ])
    
    safe_dict = {}
    
    for key in data.keys():
       safe_dict[key] = ";".join(data.getall(key))
    
    print(safe_dict)
    
    # {'1061': 'GROCERY', '1073': 'GM-HBC', '4220': 'PRODUCE', '958': 'MEAT;DAIRY;FROZEN'}
    
    Login or Signup to reply.
  3. Here is a similar solution to @BoppreH, but does not use itertools.

    import json
    import pprint
    
    
    def consolidate_duplicate_keys(list_of_pairs):
        # Add all values to a list
        result = dict()
        for k, v in list_of_pairs:
            result.setdefault(k, []).append(v)
    
        # Convert values from lists to strings
        result = {k: ";".join(v) for k, v in result.items()}
        return result
    
    
    def main():
        json_text = """
        {
            "1061": "GROCERY",
            "1073": "GM-HBC",
            "4220": "PRODUCE",
            "958": "MEAT",
            "958": "DAIRY",
            "958": "FROZEN"
        }
        """
        obj = json.loads(json_text, object_pairs_hook=consolidate_duplicate_keys)
        pprint.pprint(obj)
    
    
    if __name__ == "__main__":
        main()
    

    Output:

    {'1061': 'GROCERY',
     '1073': 'GM-HBC',
     '4220': 'PRODUCE',
     '958': 'MEAT;DAIRY;FROZEN'}
    

    Notes

    The setdefault() method will create the key/value if the key does not exist in the dictionary. If the key already exists, it does not do anything. Further more, the setdefault() method returns the value corresponding to the key. Thus the code

    result.setdefault(k, []).append(v)
    

    is a shorthand for

    if k not in result:
        result[k] = []
    result[k].append(v)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search