I have a CSV file in which each row is a string that is already formatted for JSON. Some of the elements are nested within a given key. Here is a simplified sample row:
{"key1":"string_value","key2":["string_value"],"key4":integer,"key5":[{"nested_key1":"string_value","nested_key2":boolean}],"key6":integer}
I need to take each row and write it to its own JSON file, named row_#.json
.
The closest I’ve come is with this code:
import csv
csv_file = 'path/to/my/file.csv'
with open(csv_file, mode='r') as infile:
for i, line in enumerate(infile):
with open(f"row_{i}.json", "w") as outfile:
outfile.write(line)
The outputted files show each key and each string value wrapped in an additional set of double quotes. The entire file contents are also wrapped in quotes. Using the example from above, I end up with:
"{""key1"":""string_value"",""key2"":[""string_value""],""key4"":integer,""key5"":[{""nested_key1"":""string_value"",""nested_key2"":boolean}],""key6"":integer}"
How can I output the original row contents without this extra formatting? Can I do it without reading in the outputted JSON files and trying to replace the strings? Note that I’m not able to go back to the source and format as a proper CSV before reading the file into Python.
2
Answers
The issue you’re encountering happens because the csv.reader or direct reading of the file as strings wraps each line in double quotes and escapes internal quotes when writing back to a file. To resolve this issue, treat each line as a raw string and skip any unnecessary parsing or quoting.
Since the original file is supposed to be a CSV, I suspect those extra quotes are in the file; they’re needed to handle commas and quotes nested in the field value. I’m guessing you don’t see them because you’re viewing the file with a spreadsheet application, not looking at the raw file contents.
So you need to use a CSV reader to parse the file, then write the raw text to the output files.
line[0]
is needed becausecsv.reader()
parses each row into a list of fields.