skip to Main Content

I have run the following Python script to remove duplcates between two json files based on userid:

    import json

    with open("target_user.json", "r", encoding='utf-8') as f1:
        target = json.load(f1)
    with open("source_user.json", "r", encoding='utf-8') as f2:
        source = json.load(f2)
            
    target2 = []
    for item in target:
       if item['userid'] not in [x['userid'] for x in source]:
            target2.append(item)
            
    with open('target_user2.json', 'w', encoding='utf-8') as nf:
        json.dump(target2, nf, indent=4)
        

It generates errors:

    TypeError: list indices must be integers or slices, not str

from this line:

    if item['userid'] not in [x['userid'] for x in source]:

I have tried the following, but it does not fix:

    if dict([item]['userid']) not in [dict[x]['userid'] for x in source]:    

Here is a sample of my json file:

    {
        "567897068": {
            "userid": "567897068",
            "status": "UserStatus.OFFLINE",
            "name": "btb appeal",
            "bot": false,
            "username": "None"
        },
        "5994781619": {
            "userid": "5994781619",
            "status": "UserStatus.OFFLINE",
            "name": "Ahh",
            "bot": false,
            "username": "hourng999"
        },
        "1873973169": {
            "userid": "1873973169",
            "status": "UserStatus.RECENTLY",
            "name": "Chanthy",
            "bot": false,
            "username": "Alis_Thy"
        }
    }

Any help is appreciated.

I have tried the followings advised by John, but it does not:

    import json

    with open("target_user.json", "r", encoding='utf-8') as f1:
        target = json.load(f1)
    with open("source_user.json", "r", encoding='utf-8') as f2:
        source = json.load(f2)
            
    target2 = []
    for item in target.values():
       # print(item) # print scanned results on CMD
        #if item['userid'] not in [x['userid'] for x in source]:
        #if item['userid'] not in [x['userid'] for x in source]:
        if target[item]['userid'] not in [source[x]['userid'] for x in source]:
            target2.append(item)
            
    with open('target_user2.json', 'w', encoding='utf-8') as nf:
        json.dump(target2, nf, indent=4)

3

Answers


  1. Chosen as BEST ANSWER

    I found out that there is an error in the second json file, that is why there is error mentioned. My original codes are working well. Thanks


  2. target is a dictionary.

    When you iterate over a dictionary with for item in target, it returns the keys of the dictionary, which are strings.

    So on the first loop iteration, item is the string "567897068", which causes the error.

    I think you intended to iterate over the values of the dictionary instead of the keys:

    for item in target.values():
    
    Login or Signup to reply.
  3. target is a nested dictionary. If you try to iterate as for item in target, it’s the same as for item in target.keys(). Than, the content you are looking for will be a value, represented by the next dictionary. Only in that dictionary "userid" will be a key.
    You can try to access it like target[item]["userid"], not item["userid"]. Just like this:

    for item in target:
        if target[item]['userid'] not in [source[x]['userid'] for x in source]:
                target2.append(item)
    

    Another way to do it, as mentioned by previous answer is using for item in target.values() for item to become an inner dictionary and to have access to "userid". Don’t mix it with previous solution.

    for item in target.values():
        if item['userid'] not in [x['userid'] for x in source.values()]:
                target2.append(item)
    

    Moreover, if source_user.json has the same structure as "target_user.json", than [source[x]['userid'] for x in source] will have the same issue. I already shown how to deal with it in code above.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search