skip to Main Content

I have a CSV file in which each row is a string that is already formatted for JSON. Some of the elements are nested within a given key. Here is a simplified sample row:

{"key1":"string_value","key2":["string_value"],"key4":integer,"key5":[{"nested_key1":"string_value","nested_key2":boolean}],"key6":integer}

I need to take each row and write it to its own JSON file, named row_#.json.

The closest I’ve come is with this code:


import csv

csv_file = 'path/to/my/file.csv'

with open(csv_file, mode='r') as infile:
    for i, line in enumerate(infile):
        with open(f"row_{i}.json", "w") as outfile:
            outfile.write(line)

The outputted files show each key and each string value wrapped in an additional set of double quotes. The entire file contents are also wrapped in quotes. Using the example from above, I end up with:

"{""key1"":""string_value"",""key2"":[""string_value""],""key4"":integer,""key5"":[{""nested_key1"":""string_value"",""nested_key2"":boolean}],""key6"":integer}"

How can I output the original row contents without this extra formatting? Can I do it without reading in the outputted JSON files and trying to replace the strings? Note that I’m not able to go back to the source and format as a proper CSV before reading the file into Python.

2

Answers


  1. The issue you’re encountering happens because the csv.reader or direct reading of the file as strings wraps each line in double quotes and escapes internal quotes when writing back to a file. To resolve this issue, treat each line as a raw string and skip any unnecessary parsing or quoting.

    # Path to your CSV file
    csv_file = 'path/to/my/file.csv'
    
    # Open the CSV file
    with open(csv_file, mode='r') as infile:
        # Use enumerate to keep track of row numbers
        for i, line in enumerate(infile):
            # Strip any leading/trailing whitespace or newline characters
            json_content = line.strip()
            
            # Write each row as its own JSON file
            with open(f"row_{i}.json", "w") as outfile:
                # Write the raw JSON string without adding additional quotes
                outfile.write(json_content)
    
    Login or Signup to reply.
  2. Since the original file is supposed to be a CSV, I suspect those extra quotes are in the file; they’re needed to handle commas and quotes nested in the field value. I’m guessing you don’t see them because you’re viewing the file with a spreadsheet application, not looking at the raw file contents.

    So you need to use a CSV reader to parse the file, then write the raw text to the output files.

    import csv
    
    csv_file = 'path/to/my/file.csv'
    
    with open(csv_file, mode='r') as infile:
        in_csv = csv.reader(infile)
        for i, line in enumerate(in_csv):
            with open(f"row_{i}.json", "w") as outfile:
                outfile.write(line[0] + "n")
    

    line[0] is needed because csv.reader() parses each row into a list of fields.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search