skip to Main Content

I’m working on a Python project where I need to merge two dictionaries, but I want to ensure that there are no duplicate keys as the end result should contain unique keys with their corresponding values. Here’s what I’ve tried:

dict1 = {'a': 1, 'b': 2}
dict2 = {'b': 3, 'c': 4}
merged_dict = {**dict1, **dict2}

This approach overwrites the value of duplicate keys, but I’m looking for a way to either prevent duplicates or handle them in a specific manner (e.g., by adding the values together or keeping the higher value).

I’ve searched for solutions, but most methods don’t address handling duplicates in the way I need. Is there an efficient Pythonic way to merge dictionaries while managing duplicates according to custom logic?

Environment: Python 3.8 on Ubuntu 20.04

3

Answers


  1. This is a longer way to do this than some of the ways listed in the comments but here is a way you can do it:

    #The original dicts
    dict1 = {'a': 1, 'b': 2}
    dict2 = {'b': 3, 'c': 4}
    
    #Create a list and a set for the keys.  The set will be used with union to get unique keys
    dict1keys=list(dict1.keys())
    dict2keys=list(dict2.keys())
    dict1keysset=set(dict1keys)
    dict2keysset=set(dict2keys)
    
    #Concatenate the two lists of keys
    concatekey=dict1keys+dict2keys
    
    #Get lists of values from original dicts
    dict1vals=list(dict1.values())
    dict2vals=list(dict2.values())
    
    #Add the keys and values to a dataframe
    keyValDF=pd.DataFrame()
    keyValDF['Key']=dict1keys+dict2keys
    keyValDF['Vals']=dict1vals+dict2vals
    
    #Get the key list with each key represented only once
    NewKeylst=list(dict1keysset.union(dict2keysset))
    
    #Group the data by key
    NewDF=keyValDF.groupby('Key').sum().reset_index()
    
    #Put the columns back into a new dict with sums for each key
    NewDict= dict(zip(NewDF['Key'], NewDF['Vals']))
    
    
    #This is the df that we build the dict from
    display(NewDF)
    
    #This is the new dict
    print(NewDict)
    
    

    The output will be:

    enter image description here

    If you want a max value instead of the sum just change ‘sum’ to ‘max’ and your output will be:

    enter image description here

    Login or Signup to reply.
  2. This seems straight forward enough.

    dict1 = {'a': 1, 'b': 2}
    dict2 = {'b': 3, 'c': 4}
    
    for key, value in dict2.items():
        dict1[key] = dict1.get(key, 0) + value
    
    import json
    print(json.dumps(dict1, indent=4))
    

    giving you:

    {
        "a": 1,
        "b": 5,
        "c": 4
    }
    
    Login or Signup to reply.
  3. Here is my variant of the solution:

    def handle_duplicates(v1, v2):
        # Implement how would you like to deal with duplicates
        return [v1, v2]
    
    
    def merge(d1: dict, d2: dict) -> dict:
        merged = d1 | d2
        for k in set(d1.keys()) & set(d2.keys()):
            merged[k] = handle_duplicates(d1[k], d2[k])
    
        return merged
    

    By doing a | operation we get a "pre-merged" dict which has values from both d1 and d2, with d2 taking priority. Then, in the for loop, we find keys which are present in both of dictionaries by leveraging set intersection operation, and overwrite those keys with values in the way we want.

    >> merge({"a": 1, "b": 2}, {"a": 3})
    {'a': [1, 3], 'b': 2}
    

    Or, if you would prefer a one-liner, here you go:

    def merge(d1: dict, d2: dict) -> dict:
        return {
            k: handle_duplicates(d1[k], d2[k])
            if k in d1 and k in d2
            else d1.get(k) or d2.get(k)
            for k in set(d1.keys()) | set(d2.keys())
        }
    

    But it looks huge, personally I would prefer the first option

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search