skip to Main Content

I need to convert a csv file into a json file. I have tried some python examples that are available on-line but none seem to fit my needs.

My csv file contains a header then multiple rows for each date. Im not entirely sure on the terminology but I think I need each unique id in the first field in the csv (NA, NE, SE) to have its own object in the json.

Any help would be appreciated. Cheers!

Example of input

A1,sDate,eDate,A2
NA,01/01/2023,01/05/2023,2
NE,01/01/2023,01/05/2023,3
SE,01/01/2023,01/05/2023,4
NA,02/01/2023,02/05/2023,5
NE,02/01/2023,02/05/2023,6
SE,02/01/2023,02/05/2023,7

My desired output would be

{
    "NA": [
        {
            "sDate": "01/01/2023",
            "eDate": "01/05/2023",
            "a2": 2
        },
        {
            "sDate": "02/01/2023",
            "eDate": "02/05/2023",
            "a2": 5
        }
],
    "NE": [
        {
            "sDate": "01/01/2023",
            "eDate": "01/05/2023",
            "a2": 3
        },
        {
            "sDate": "02/01/2023",
            "eDate": "02/05/2023",
            "a2": 6
        }
],
    "SE": [
        {
            "sDate": "01/01/2023",
            "eDate": "01/05/2023",
            "a2": 4
        },
        {
            "sDate": "02/01/2023",
            "eDate": "02/05/2023",
            "a2": 7
        }
]

I was using the pandas library to help with this but its not giving the results I need.

import pandas
import json

df = pandas.read_csv('file.csv', names=("A1","sDate","eDate","A2"))

df.to_json('file.json', orient='records')

with open('file.json', 'w') as f:
    json.dump({'NA': df.to_dict(orient='records')}, f, indent=4)
{
    "NA": [
        {
            "A1": "NA",
            "sDate": "01/01/2023",
            "eDate": "01/05/2023",
            "A2": "2"
        },
        {
            "A1": "NE",
            "sDate": "01/01/2023",
            "eDate": "01/05/2023",
            "A2": "3"
        },
            "A1": "SE",
            "sDate": "01/01/2023",
            "eDate": "01/05/2023",
            "A2": "4"
        },
         {
            "A1": "NA",
            "sDate": "02/01/2023",
            "eDate": "02/05/2023",
            "A2": "5"
        },
        {
            "A1": "NE",
            "sDate": "02/01/2023",
            "eDate": "02/05/2023",
            "A2": "6"
        },
            "A1": "SE",
            "sDate": "02/01/2023",
            "eDate": "02/05/2023",
            "A2": "7"
        }
]

2

Answers


  1. defaultdict is very useful for this kind of processing

    def build_structure(reader: typing.List[dict]) -> dict:
        structure = collections.defaultdict(list)
        for d in reader:
            structure[d["A1"]].append(
                {"sDate": d["sDate"], "eDate": d["eDate"], "a2": d["A2"]}
            )
        return structure
    
    1. Read the file and create a csv.DictReader of the file and convert it to a list,
    2. Send the list to the build_structure function.
    3. You have a defaultdict, you can dumps it with json.
    import collections
    import csv
    import json
    import typing
    
    
    def read_csv(file_name: str) -> list:
        with open(file_name, "r") as f:
            reader = csv.DictReader(f)
            return list(reader)
    
    
    def build_structure(reader: typing.List[dict]) -> dict:
        structure = collections.defaultdict(list)
        for d in reader:
            structure[d["A1"]].append(
                {"sDate": d["sDate"], "eDate": d["eDate"], "a2": d["A2"]}
            )
        return structure
    
    
    data = read_csv("data.csv")
    structure = build_structure(data)
    print(json.dumps(structure))
    # {'NA': [{'a2': '2', 'eDate': '01/05/2023', 'sDate': '01/01/2023'},
    #         {'a2': '5', 'eDate': '02/05/2023', 'sDate': '02/01/2023'}],
    #  'NE': [{'a2': '3', 'eDate': '01/05/2023', 'sDate': '01/01/2023'},
    #         {'a2': '6', 'eDate': '02/05/2023', 'sDate': '02/01/2023'}],
    #  'SE': [{'a2': '4', 'eDate': '01/05/2023', 'sDate': '01/01/2023'},
    #         {'a2': '7', 'eDate': '02/05/2023', 'sDate': '02/01/2023'}]}
    

    If you want a beautified json string, you can use json.dumps(structure, sort_keys=True, indent=4) instead:

    >>> print(json.dumps(structure, sort_keys=True, indent=4))
    {
        "NA": [
            {
                "a2": "2",
                "eDate": "01/05/2023",
                "sDate": "01/01/2023"
            },
            {
                "a2": "5",
                "eDate": "02/05/2023",
                "sDate": "02/01/2023"
            }
        ],
        "NE": [
            {
                "a2": "3",
                "eDate": "01/05/2023",
                "sDate": "01/01/2023"
            },
            {
                "a2": "6",
                "eDate": "02/05/2023",
                "sDate": "02/01/2023"
            }
        ],
        "SE": [
            {
                "a2": "4",
                "eDate": "01/05/2023",
                "sDate": "01/01/2023"
            },
            {
                "a2": "7",
                "eDate": "02/05/2023",
                "sDate": "02/01/2023"
            }
        ]
    }
    
    Login or Signup to reply.
  2. You can use pandas and groupby:

    import pandas as pd
    import json
    
    df = pd.read_csv("your_file.csv", keep_default_na=False)
    res = {}
    for a1, df_gp in df.groupby('A1'):
        res[a1] = df_gp.drop(columns='A1').to_dict(orient='records')
    print(json.dumps(res, indent=4))
    

    Output:

    {
        "NA": [
            {
                "sDate": "01/01/2023",
                "eDate": "01/05/2023",
                "A2": 2
            },
            {
                "sDate": "02/01/2023",
                "eDate": "02/05/2023",
                "A2": 5
            }
        ],
        "NE": [
            {
                "sDate": "01/01/2023",
                "eDate": "01/05/2023",
                "A2": 3
            },
            {
                "sDate": "02/01/2023",
                "eDate": "02/05/2023",
                "A2": 6
            }
        ],
        "SE": [
            {
                "sDate": "01/01/2023",
                "eDate": "01/05/2023",
                "A2": 4
            },
            {
                "sDate": "02/01/2023",
                "eDate": "02/05/2023",
                "A2": 7
            }
        ]
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search