I need to convert a csv file into a json file. I have tried some python examples that are available on-line but none seem to fit my needs.
My csv file contains a header then multiple rows for each date. Im not entirely sure on the terminology but I think I need each unique id in the first field in the csv (NA, NE, SE) to have its own object in the json.
Any help would be appreciated. Cheers!
Example of input
A1,sDate,eDate,A2
NA,01/01/2023,01/05/2023,2
NE,01/01/2023,01/05/2023,3
SE,01/01/2023,01/05/2023,4
NA,02/01/2023,02/05/2023,5
NE,02/01/2023,02/05/2023,6
SE,02/01/2023,02/05/2023,7
My desired output would be
{
"NA": [
{
"sDate": "01/01/2023",
"eDate": "01/05/2023",
"a2": 2
},
{
"sDate": "02/01/2023",
"eDate": "02/05/2023",
"a2": 5
}
],
"NE": [
{
"sDate": "01/01/2023",
"eDate": "01/05/2023",
"a2": 3
},
{
"sDate": "02/01/2023",
"eDate": "02/05/2023",
"a2": 6
}
],
"SE": [
{
"sDate": "01/01/2023",
"eDate": "01/05/2023",
"a2": 4
},
{
"sDate": "02/01/2023",
"eDate": "02/05/2023",
"a2": 7
}
]
I was using the pandas library to help with this but its not giving the results I need.
import pandas
import json
df = pandas.read_csv('file.csv', names=("A1","sDate","eDate","A2"))
df.to_json('file.json', orient='records')
with open('file.json', 'w') as f:
json.dump({'NA': df.to_dict(orient='records')}, f, indent=4)
{
"NA": [
{
"A1": "NA",
"sDate": "01/01/2023",
"eDate": "01/05/2023",
"A2": "2"
},
{
"A1": "NE",
"sDate": "01/01/2023",
"eDate": "01/05/2023",
"A2": "3"
},
"A1": "SE",
"sDate": "01/01/2023",
"eDate": "01/05/2023",
"A2": "4"
},
{
"A1": "NA",
"sDate": "02/01/2023",
"eDate": "02/05/2023",
"A2": "5"
},
{
"A1": "NE",
"sDate": "02/01/2023",
"eDate": "02/05/2023",
"A2": "6"
},
"A1": "SE",
"sDate": "02/01/2023",
"eDate": "02/05/2023",
"A2": "7"
}
]
2
Answers
defaultdict
is very useful for this kind of processingcsv.DictReader
of the file and convert it to a list,list
to thebuild_structure
function.defaultdict
, you candumps
it withjson
.If you want a beautified json string, you can use
json.dumps(structure, sort_keys=True, indent=4)
instead:You can use pandas and groupby:
Output: