I have run the following Python script to remove duplcates between two json files based on userid:
import json
with open("target_user.json", "r", encoding='utf-8') as f1:
target = json.load(f1)
with open("source_user.json", "r", encoding='utf-8') as f2:
source = json.load(f2)
target2 = []
for item in target:
if item['userid'] not in [x['userid'] for x in source]:
target2.append(item)
with open('target_user2.json', 'w', encoding='utf-8') as nf:
json.dump(target2, nf, indent=4)
It generates errors:
TypeError: list indices must be integers or slices, not str
from this line:
if item['userid'] not in [x['userid'] for x in source]:
I have tried the following, but it does not fix:
if dict([item]['userid']) not in [dict[x]['userid'] for x in source]:
Here is a sample of my json file:
{
"567897068": {
"userid": "567897068",
"status": "UserStatus.OFFLINE",
"name": "btb appeal",
"bot": false,
"username": "None"
},
"5994781619": {
"userid": "5994781619",
"status": "UserStatus.OFFLINE",
"name": "Ahh",
"bot": false,
"username": "hourng999"
},
"1873973169": {
"userid": "1873973169",
"status": "UserStatus.RECENTLY",
"name": "Chanthy",
"bot": false,
"username": "Alis_Thy"
}
}
Any help is appreciated.
I have tried the followings advised by John, but it does not:
import json
with open("target_user.json", "r", encoding='utf-8') as f1:
target = json.load(f1)
with open("source_user.json", "r", encoding='utf-8') as f2:
source = json.load(f2)
target2 = []
for item in target.values():
# print(item) # print scanned results on CMD
#if item['userid'] not in [x['userid'] for x in source]:
#if item['userid'] not in [x['userid'] for x in source]:
if target[item]['userid'] not in [source[x]['userid'] for x in source]:
target2.append(item)
with open('target_user2.json', 'w', encoding='utf-8') as nf:
json.dump(target2, nf, indent=4)
3
Answers
I found out that there is an error in the second json file, that is why there is error mentioned. My original codes are working well. Thanks
target
is a dictionary.When you iterate over a dictionary with
for item in target
, it returns the keys of the dictionary, which are strings.So on the first loop iteration,
item
is the string "567897068", which causes the error.I think you intended to iterate over the values of the dictionary instead of the keys:
target
is a nested dictionary. If you try to iterate asfor item in target
, it’s the same asfor item in target.keys()
. Than, the content you are looking for will be a value, represented by the next dictionary. Only in that dictionary "userid" will be a key.You can try to access it like
target[item]["userid"]
, notitem["userid"]
. Just like this:Another way to do it, as mentioned by previous answer is using
for item in target.values()
foritem
to become an inner dictionary and to have access to"userid"
. Don’t mix it with previous solution.Moreover, if
source_user.json
has the same structure as"target_user.json"
, than[source[x]['userid'] for x in source]
will have the same issue. I already shown how to deal with it in code above.