I have a 4gb json file I need to convert it to csv I tried the following code:
import json
import csv
csv.field_size_limit(10**9)
With open('name.json') as json_file:
jsondata = json.load(json_file)
data_file = open('name.csv', 'w', newline='')
csv_writer = csv.writer(data_file)
count = 0
for data in jsondata:
if count == 0:
header = data.keys()
csv_writer.writerow(header)
count += 1
csv_writer.writerow(data.values())
data_file.close()
Tried many variants of the same code. Always got error json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 3756)
Need code that reads json line by line and writes data to csv
UPD: Example of three lines from this file on pastebin
3
Answers
You can read JSON as a Pandas DataFrame and save the loaded DataFrame as CSV.
You can refer to https://pandas.pydata.org/docs/reference/api/pandas.read_json.html for complete documentation.
This will work as long as JSON is having no syntax errors.
As Michael Butscher alluded to in a comment, you probably have a JSON Lines file: multiple valid JSON objects line-after-line. I say probably because your description of the problem and the error codes points to JSON Lines, but the (formatted) JSON in the pastebin link has been indented and therefor isn’t "lines" anymore.
Still, as Michael was saying, you can open the file like normal, iterate over lines, and load each line (as a string):
From there you can decide how to get that into your CSV, maybe something like:
Here’s a complete suggestion:
Given this input JSON Lines:
I get this CSV:
Can you try this?