Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

How to convert large JSON to CSV in Python?

Caxa
June 13, 2023
138 views
2 votes
3 Answers

I have a 4gb json file I need to convert it to csv I tried the following code:

import json
import csv

csv.field_size_limit(10**9)

With open('name.json') as json_file:
    jsondata = json.load(json_file)
 
data_file = open('name.csv', 'w', newline='')
csv_writer = csv.writer(data_file)
 
count = 0
for data in jsondata:
    if count == 0:
        header = data.keys()
        csv_writer.writerow(header)
        count += 1
    csv_writer.writerow(data.values())
 
data_file.close()

Tried many variants of the same code. Always got error json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 3756)

Need code that reads json line by line and writes data to csv

UPD: Example of three lines from this file on pastebin

Answers

- MohitGoyal
- June 11, 2023 at 3:41 pm
- 0 votes
0
You can read JSON as a Pandas DataFrame and save the loaded DataFrame as CSV.
```
import pandas
df = pandas.read_json(path_or_buffer)
df.to_csv(output_path)
```
You can refer to https://pandas.pydata.org/docs/reference/api/pandas.read_json.html for complete documentation.

This will work as long as JSON is having no syntax errors.
Login or Signup to reply.

As Michael Butscher alluded to in a comment, you probably have a JSON Lines file: multiple valid JSON objects line-after-line. I say probably because your description of the problem and the error codes points to JSON Lines, but the (formatted) JSON in the pastebin link has been indented and therefor isn’t "lines" anymore.

Still, as Michael was saying, you can open the file like normal, iterate over lines, and load each line (as a string):

f_in = open("input.json")
for line in f_in:
    line_data = json.loads(line)

From there you can decide how to get that into your CSV, maybe something like:

    data = line_data["data"]
    writer.writerow([data["id"], data["discount"]])

Here’s a complete suggestion:

import csv
import json

writer = csv.writer(open("output.csv", "w"))

f_in = open("input.json")
for i, line in enumerate(f_in):
    line_data = json.loads(line)
    meta = line_data["meta"]
    data = line_data["data"]

    if i == 0:
        writer.writerow(list(meta.keys()) + list(data.keys()))
    writer.writerow(list(meta.values()) + list(data.values()))

Given this input JSON Lines:

{"meta": {"type": "order"}, "data": {"id":  107042415, "discount":  330, "personCount":  3}}
{"meta": {"type": "order"}, "data": {"id":  107042785, "discount":  0, "personCount":  2}}
{"meta": {"type": "order"}, "data": {"id":  107042866, "discount":  0, "personCount":  1}}

I get this CSV:

type,id,discount,personCount
order,107042415,330,3
order,107042785,0,2
order,107042866,0,1

import json
import csv

csv.field_size_limit(10**9)

# Open the CSV file for writing.
data_file = open('name.csv', 'w', newline='')
csv_writer = csv.writer(data_file)

# Flag to indicate whether the header row has been written to the CSV file.
header_written = False

# Open the JSON file for reading.
with open('name.json', 'r') as json_file:
    for line in json_file:
        # Parse the JSON data for the current line.
        data = json.loads(line)

        # Write the header row if it hasn't been written yet.
        if not header_written:
            csv_writer.writerow(data.keys())
            header_written = True

        # Write the data row.
        csv_writer.writerow(data.values())

data_file.close()

Can you try this?

Please signup or login to give your own answer.

Click here to cancel reply.