skip to Main Content

I am trying to read a csv file into a dataframe to transform that into nested jsons. The deliminter in the csv file is ‘;’. After loading the data into a dataframe, following is what I get:

name frequency frequency_start_date frequency_count schedule_type day_of_week Comments
Pattern 1 daily 2022-02-01 1 weekdays Monday null
Pattern 2 daily 2024-01-01 1, 5 weekdays, weekdays Monday, Tuesday null, null
Pattern 3 daily 2021-03-21 1, 2 weekdays, weekdays Thursday,Friday null, null

This is the code I wrote to load the csv as a dataframe:

df1_data = pd.read_csv(csv1, delimiter=';', keep_default_na=False, dtype=object)

This is the code I wrote to transform the dataframe into json:

nested_cols = ['frequency_count', 'schedule_type', 'day_of_week']

df1_data['patterns'] = df1_data[nested_cols].to_dict('records')

df1_nested2 = df1_data[['name', 'frequency', 'frequency_start_date', 'patterns', 'comments']].to_json(orient='records', indent=4)

When I run this, I get the following:

[
    {
        "name": "Pattern 1",
        "frequency": "daily",
        "frequency_start_date": "2022-02-01",
        "patterns": {
            "frequency_count": "1",
            "schedule_type": "weekdays",
            "day_of_week": null
        },
        "comments": null
    },
    {
        "name": "Pattern 2",
        "frequency": "daily",
        "frequency_start_date": "2024-01-01",
        "patterns": {
            "frequency_count": "1, 5",
            "schedule_type": "weekdays,weekdays",
            "day_of_week": "null, null"
        },
        "comments": null
    },
    {
        "name": "Pattern 3",
        "frequency": "daily",
        "frequency_start_date": "2021-03-21",
        "patterns": {
            "frequency_count": "1, 2",
            "schedule_type": "weekdays,weekdays",
            "day_of_week": "null, null"
        },
        "comments": null
    }
]

But this is what I want:

[
    {
        "name": "Pattern 1",
        "frequency": "daily",
        "frequency_start_date": "2022-02-01",
        "patterns": {
            "frequency_count": "1",
            "schedule_type": "weekdays",
            "day_of_week": null
        },
        "comments": null
    },
    {
        "name": "Pattern 2",
        "frequency": "daily",
        "frequency_start_date": "2024-01-01",
        "patterns": {
            "frequency_count": 1,
            "schedule_type": "weekdays",
            "day_of_week": null
        },
        {
            "frequency_count": 5,
            "schedule_type": "weekdays",
            "day_of_week": null
        },
        "comments": null
    },
    {
        "name": "Pattern 3",
        "frequency": "daily",
        "frequency_start_date": "2021-03-21",
        "patterns": {
            "frequency_count": 1,
            "schedule_type": "weekdays",
            "day_of_week": null
        },
        {
            "frequency_count": 2,
            "schedule_type": "weekdays",
            "day_of_week": null
        },
        "comments": null
    }
]

This is where I am stuck. Is this an issue with csv into dataframe? Or dataframe into json? It seems that the columns where there are multiple values are read as strings. I read questions posted at StackOverflow but none of them actually had this issue or answers did not help me to solve this.

Any help would be greatly appreciated.

Thanks,

Beta

2

Answers


  1. Chosen as BEST ANSWER

    Thanks to @BeRT2me, the final code for this is the following:

    nested_cols = ['frequency_count', 'schedule_type', 'day_of_week']
    
    # splitting the strings in nested columns
    for col in nested_cols:
        df1_data[col] = df1_data[col].str.split(',')
    
    #exploding the columns
    df1_data.explode(nested_cols)
    
    #changed the frequency_count type to int as a number is always expected
    df1_data['frequency_count'] = df1_data['frequency_count'].astype(int)
    
    df1_data['patterns'] = df1_data[nested_cols].to_dict("records")
    
    # Using @BeRT2me response:
    output = df.pivot_table(
    index=["name", "frequency", "frequency_start_date", "Comments"], 
    values="patterns", 
    aggfunc=list).reset_index().to_json(orient="records", indent=4)
    
    # replacing the string "null" by null in json
    df1_nested = df1_nested.replace('"null"', 'null')
    

    The output is:

    [
    {
        "pattern_name":"daily_pattern_cadence_start_after_effective_start",
        "cadence":"Daily",
        "cadence_start_date":"2023-02-01",
        "comments":null,
        "patterns":[
            {
                "cadence_count":1,
                "weekend_type":"Western",
                "annual_reset":"No",
                "fixed_day_of_week":null,
                "effective_start_date":"2022-12-31",
                "effective_end_date":"2023-12-31"
            },
            {
                "cadence_count":1,
                "weekend_type":" Western",
                "annual_reset":" No",
                "fixed_day_of_week":" null",
                "effective_start_date":" 2020-12-31",
                "effective_end_date":" 2021-12-31"
            }
        ]
    },
    {
        "pattern_name":"daily_pattern_effective_start_greater_previous_effective_end",
        "cadence":"Daily",
        "cadence_start_date":"2019-05-31",
        "comments":null,
        "patterns":[
            {
                "cadence_count":1,
                "weekend_type":"Western",
                "annual_reset":"No",
                "fixed_day_of_week":null,
                "effective_start_date":"2019-05-31",
                "effective_end_date":"2021-12-31"
            },
            {
                "cadence_count":1,
                "weekend_type":" Western",
                "annual_reset":" No",
                "fixed_day_of_week":null,
                "effective_start_date":" 2022-03-31",
                "effective_end_date":" null"
            }
        ]
    },
    {
        "pattern_name":"daily_pattern_success",
        "cadence":"Daily",
        "cadence_start_date":"2022-02-01",
        "comments":"",
        "patterns":[
            {
                "cadence_count":1,
                "weekend_type":"Western",
                "annual_reset":"No",
                "fixed_day_of_week":null,
                "effective_start_date":"2022-02-01",
                "effective_end_date":null
            }
        ]
    }
    

    ]


  2. Given:

            name frequency frequency_start_date frequency_count       schedule_type      day_of_week    Comments
    0  Pattern 1     daily           2022-02-01               1            weekdays           Monday         NaN
    1  Pattern 2     daily           2024-01-01            1, 5  weekdays, weekdays  Monday, Tuesday  null, null
    2  Pattern 3     daily           2021-03-21            1, 2  weekdays, weekdays  Thursday,Friday  null, null
    

    Doing:

    1. Convert the strings to lists:
    multi_cols = ["frequency_count", "schedule_type", "day_of_week", "Comments"]
    
    for col in multi_cols:
        df[col] = df[col].str.split(", ?") # regex
    
    1. Explode these columns:
    df = df.explode(multi_cols)
    
    1. Make this patterns column and fix the Comments column:
    df["patterns"] = df[["frequency_count", "schedule_type", "day_of_week"]].to_dict("records")
    df["Comments"] = df["Comments"].fillna("null")
    
    1. Pivot and export to json:
    output = df.pivot_table(
        index=["name", "frequency", "frequency_start_date", "Comments"], 
        values="patterns", 
        aggfunc=list, 
    ).reset_index().to_json(orient="records", indent=4)
    print(output)
    

    Output:

    [
        {
            "name":"Pattern 1",
            "frequency":"daily",
            "frequency_start_date":"2022-02-01",
            "Comments":"null",
            "patterns":[
                {
                    "frequency_count":"1",
                    "schedule_type":"weekdays",
                    "day_of_week":"Monday"
                }
            ]
        },
        {
            "name":"Pattern 2",
            "frequency":"daily",
            "frequency_start_date":"2024-01-01",
            "Comments":"null",
            "patterns":[
                {
                    "frequency_count":"1",
                    "schedule_type":"weekdays",
                    "day_of_week":"Monday"
                },
                {
                    "frequency_count":"5",
                    "schedule_type":"weekdays",
                    "day_of_week":"Tuesday"
                }
            ]
        },
        {
            "name":"Pattern 3",
            "frequency":"daily",
            "frequency_start_date":"2021-03-21",
            "Comments":"null",
            "patterns":[
                {
                    "frequency_count":"1",
                    "schedule_type":"weekdays",
                    "day_of_week":"Thursday"
                },
                {
                    "frequency_count":"2",
                    "schedule_type":"weekdays",
                    "day_of_week":"Friday"
                }
            ]
        }
    ]
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search