skip to Main Content

I have a 4gb json file I need to convert it to csv I tried the following code:

import json
import csv

csv.field_size_limit(10**9)

With open('name.json') as json_file:
    jsondata = json.load(json_file)
 
data_file = open('name.csv', 'w', newline='')
csv_writer = csv.writer(data_file)
 
count = 0
for data in jsondata:
    if count == 0:
        header = data.keys()
        csv_writer.writerow(header)
        count += 1
    csv_writer.writerow(data.values())
 
data_file.close()

Tried many variants of the same code. Always got error json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 3756)

Need code that reads json line by line and writes data to csv

UPD: Example of three lines from this file on pastebin

3

Answers


  1. You can read JSON as a Pandas DataFrame and save the loaded DataFrame as CSV.

    import pandas
    df = pandas.read_json(path_or_buffer)
    df.to_csv(output_path)
    

    You can refer to https://pandas.pydata.org/docs/reference/api/pandas.read_json.html for complete documentation.

    This will work as long as JSON is having no syntax errors.

    Login or Signup to reply.
  2. As Michael Butscher alluded to in a comment, you probably have a JSON Lines file: multiple valid JSON objects line-after-line. I say probably because your description of the problem and the error codes points to JSON Lines, but the (formatted) JSON in the pastebin link has been indented and therefor isn’t "lines" anymore.

    Still, as Michael was saying, you can open the file like normal, iterate over lines, and load each line (as a string):

    f_in = open("input.json")
    for line in f_in:
        line_data = json.loads(line)
    
    

    From there you can decide how to get that into your CSV, maybe something like:

        data = line_data["data"]
        writer.writerow([data["id"], data["discount"]])
    

    Here’s a complete suggestion:

    import csv
    import json
    
    writer = csv.writer(open("output.csv", "w"))
    
    f_in = open("input.json")
    for i, line in enumerate(f_in):
        line_data = json.loads(line)
        meta = line_data["meta"]
        data = line_data["data"]
    
        if i == 0:
            writer.writerow(list(meta.keys()) + list(data.keys()))
        writer.writerow(list(meta.values()) + list(data.values()))
    

    Given this input JSON Lines:

    {"meta": {"type": "order"}, "data": {"id":  107042415, "discount":  330, "personCount":  3}}
    {"meta": {"type": "order"}, "data": {"id":  107042785, "discount":  0, "personCount":  2}}
    {"meta": {"type": "order"}, "data": {"id":  107042866, "discount":  0, "personCount":  1}}
    

    I get this CSV:

    type,id,discount,personCount
    order,107042415,330,3
    order,107042785,0,2
    order,107042866,0,1
    
    Login or Signup to reply.
  3. import json
    import csv
    
    csv.field_size_limit(10**9)
    
    # Open the CSV file for writing.
    data_file = open('name.csv', 'w', newline='')
    csv_writer = csv.writer(data_file)
    
    # Flag to indicate whether the header row has been written to the CSV file.
    header_written = False
    
    # Open the JSON file for reading.
    with open('name.json', 'r') as json_file:
        for line in json_file:
            # Parse the JSON data for the current line.
            data = json.loads(line)
    
            # Write the header row if it hasn't been written yet.
            if not header_written:
                csv_writer.writerow(data.keys())
                header_written = True
    
            # Write the data row.
            csv_writer.writerow(data.values())
    
    data_file.close()
    

    Can you try this?

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search