I want to merget 2 json files into one json file and remove all duplicated rows based on a column (second column). At the moment I merge two or multiple json files manually, then I use python codes to remove all rows with duplicated userid.
First json file:
[
{
"userid": "567897068",
"status": "UserStatus.RECENTLY",
"name": "btb appeal court",
"bot": false,
"username": "None"
},
{
"userid": "6403980168",
"status": "UserStatus.RECENTLY",
"name": "Ah",
"bot": false,
"username": "fearpic"
},
{
"userid": "7104649590",
"status": "UserStatus.RECENTLY",
"name": "Da",
"bot": false,
"username": "Abc130000"
},
{
"userid": "5813962086",
"status": "UserStatus.RECENTLY",
"name": "Sothea",
"bot": false,
"username": "SotheaSopheap169"
}
]
Second json file:
[
{
"userid": "567897068",
"status": "UserStatus.RECENTLY",
"name": "btb appeal court",
"bot": false,
"username": "None"
},
{
"userid": "111111111111",
"status": "UserStatus.RECENTLY",
"name": "Ah",
"bot": false,
"username": "fearpic"
},
{
"userid": "7104649590",
"status": "UserStatus.RECENTLY",
"name": "Da",
"bot": false,
"username": "Abc130000"
},
{
"userid": "555555555555",
"status": "UserStatus.RECENTLY",
"name": "Sothea",
"bot": false,
"username": "SotheaSopheap169"
}
]
merged file should be:
[
{
"userid": "567897068",
"status": "UserStatus.RECENTLY",
"name": "btb appeal court",
"bot": false,
"username": "None"
},
{
"userid": "6403980168",
"status": "UserStatus.RECENTLY",
"name": "Ah",
"bot": false,
"username": "fearpic"
},
{
"userid": "7104649590",
"status": "UserStatus.RECENTLY",
"name": "Da",
"bot": false,
"username": "Abc130000"
},
{
"userid": "5813962086",
"status": "UserStatus.RECENTLY",
"name": "Sothea",
"bot": false,
"username": "SotheaSopheap169"
},
{
"userid": "111111111111",
"status": "UserStatus.RECENTLY",
"name": "Ah",
"bot": false,
"username": "fearpic"
},
{
"userid": "555555555555",
"status": "UserStatus.RECENTLY",
"name": "Sothea",
"bot": false,
"username": "SotheaSopheap169"
}
]
I have use the following codes to remove duplicates based on userid column in a json file I merged manually:
import json
with open('source_user_all.json', 'r', encoding='utf-8') as f:
jsons = json.load(f)
ids = set()
jsons2 = []
for item in jsons:
if item['userid'] not in ids:
ids.add(item['userid'])
jsons2.append(item)
with open('source_user.json', 'w', encoding='utf-8') as nf:
json.dump(jsons2, nf, indent=4)
The above work well.
Is there an easy way of merging multip json files, and remove all duplicates based on a column before writing to a single output file?
Thanks
2
Answers
You just need to build a dictionary by looping over your input files.
Like this:
You could remove the conditional check (not uid in td) if you don’t care which duplicate is removed
You know how to apply the logic on a
list
ofdict
(your items)Now you want to apply it to a
list
(each file) oflist
ofdict
, so just add another loop aroundUsing
dict.setdefault
you can have a nicer and shorter code