Converting large (140MB) JSON file into CSV in Python

AmazingRandom
June 13, 2023
131 views
0 votes
2 Answers

I’ve been trying to do a data wrangling exercise with large files and I chose this big file, I googled around and saw that it is fairly straightforward in Python, but I’m not the most well versed with coding in yet. So I’d need a bit more explanation (Explain like I’m five) on how to set it up properly. I’ve done the ‘saving file with another name to format it’ but that is how it got formatted.
This is how it is formatted as

I’m not sure if it is supposed to look this way from the python codes I found, so, if anyone could help or clarify it to me. This is a code I found.

import json
import csv
 
with open('G:Akhiljsonoutput.json') as json_file:
    jsondata = json.load(json_file)
 
data_file = open('G:Akhiljsonoutput.csv', 'w', newline='')
csv_writer = csv.writer(data_file)
 
count = 0
for data in jsondata:
    if count == 0:
        header = data.keys()
        csv_writer.writerow(header)
        count += 1
    csv_writer.writerow(data.values())
 
data_file.close()

Is that code correct? If yes how do I edit it to convert it? How do I download and save that file so I can open it on excel? Thanks in advance

This is a new thing to me so I’m trying to figure it out, but I’m stuck in what way is the correct one, as I’m not even sure the JSON file is formatted correctly

Answers

- tdelaney
- June 7, 2023 at 11:42 pm
- 0 votes
0
You have a couple of problems. First, your data looks like a dict whose values are the dicts you want written to the CSV. for data in jsondata is just the keys, not the values you want written to the CSV. So, the for loop won’t work.

If it did, the count = 0 thing isn’t quite right. You write the header twice. And by writing the values() to the CSV, you assume that each dict values are always written in the same order. But JSON dict is not ordered, so that assumption doesn’t work.

Better to use csv.DictWriter which handles dictionaries. If this is a well-known JSON format, its best to hard code the header in the python script. That will help the script detect errors. But in this example I assume there is always at least one entry and any of the entry keys can be used for the header.
```
import json
import csv

# json data is a dict whose values are dicts to be written to a csv.

with open('G:Akhiljsonoutput.json') as json_file:
    jsondata = json.load(json_file)

# assuming all dicts have the same keys, grab the first one as header
fieldnames = next(jsondata.values())

with open('G:Akhiljsonoutput.csv', 'w', newline='') as data_file:
    csv_writer = csv.DictWriter(data_file, fieldnames)
    csv_writer.writerows(jsondata.values())
```
Note that this code ignores the keys of the outer dict ("AFG", etc…). If you need that information too, you can add it to the inner dicts before the CSV write.
Login or Signup to reply.

- ZachYoung
- June 8, 2023 at 5:43 am
- 0 votes
0
When converting anything (JSON, XML, …) to CSV you need to have a clear idea of what you want the final CSV to look like.

For JSON that’s just a list of flat objects:
```
[
    {"id": "1", "name": "foo", "score": 0},
    {"id": "2", "name": "bar", "score": 0},
    {"id": "1", "name": "foo", "score": 1},
    {"id": "3", "name": "baz", "score": 0},
    {"id": "3", "name": "baz", "score": 2}
]
```
We can visualize that easily as this CSV:
```
id,name,score
1,foo,0
2,bar,0
1,foo,1
3,baz,0
3,baz,2
```
and the Python code to do the transform:
```
import csv
import json

with open("input_flat.json") as f:
    data = json.load(f)

with open("output_flat.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=data[0].keys())
    writer.writeheader()
    for row in data:
        writer.writerow(row)
```
Since the JSON was just an array of objects, or in Python terms, a list of dicts, we can use the DictWriter to write each individual dict (row) in data. We also have to create the DictWriter with the fieldnames it will be looking for in the individual dicts, so data[0].keys() to get the keys from the first dict in the data.

For JSON a little more like yours, where we have a nested structure, kind of like:
```
{
    "1": {
        "name": "foo",
        "data": [
            {"score": 0},
            {"score": 1}
        ]
    },
    "2": {
        "name": "bar",
        "data": [
            {"score": 0}
        ]
    },
    "3": {
        "name": "baz",
        "data": [
            {"score": 0},
            {"score": 2}
        ]
    }
}
```
Should the CSV look similar to the CSV from above? (only you, dear programmer, can answer that question) If so, we can still use the DictWriter like above, but now the row-dict isn’t handed to us, we need to build it ourselves:
```
with open("input_nested.json") as f:
    all_data = json.load(f)

rows = []
for id_, obj in all_data.items():
    for data in obj["data"]:
        row = {
            "id": id_,
            "name": obj["name"],
            "score": data["score"],
        }

        rows.append(row)

with open("output_nested.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=rows[0].keys())
    writer.writeheader()
    for row in rows:
        writer.writerow(row)
```
I renamed data to all_data when I loaded the JSON. Next, we flatten each object in all_data, and its sub-objects (under the "data" key), down to a single row; or, "build up" the row-dict from the bottom of the structure. (I also had to be careful and name my variable in the outer loop id_ because id is resevered word in Python.) And that gives us:
```
id,name,score
1,foo,0
1,foo,1
2,bar,0
3,baz,0
3,baz,2
```
Not identical to the one above, because of sorting, but pretty much equal.

I wasn’t able to find the actual JSON you used, despite searching OWID for about 20 minutes, but from your screenshot these structures are very similar. If you can follow along with the simple foo-bar-baz examples and understand the structure of the nested JSON, and then how to work with it in Python, you can start designing your final, desired CSV.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.