I am trying to read a csv file into a dataframe to transform that into nested jsons. The deliminter in the csv file is ‘;’. After loading the data into a dataframe, following is what I get:
name | frequency | frequency_start_date | frequency_count | schedule_type | day_of_week | Comments |
---|---|---|---|---|---|---|
Pattern 1 | daily | 2022-02-01 | 1 | weekdays | Monday | null |
Pattern 2 | daily | 2024-01-01 | 1, 5 | weekdays, weekdays | Monday, Tuesday | null, null |
Pattern 3 | daily | 2021-03-21 | 1, 2 | weekdays, weekdays | Thursday,Friday | null, null |
This is the code I wrote to load the csv as a dataframe:
df1_data = pd.read_csv(csv1, delimiter=';', keep_default_na=False, dtype=object)
This is the code I wrote to transform the dataframe into json:
nested_cols = ['frequency_count', 'schedule_type', 'day_of_week']
df1_data['patterns'] = df1_data[nested_cols].to_dict('records')
df1_nested2 = df1_data[['name', 'frequency', 'frequency_start_date', 'patterns', 'comments']].to_json(orient='records', indent=4)
When I run this, I get the following:
[
{
"name": "Pattern 1",
"frequency": "daily",
"frequency_start_date": "2022-02-01",
"patterns": {
"frequency_count": "1",
"schedule_type": "weekdays",
"day_of_week": null
},
"comments": null
},
{
"name": "Pattern 2",
"frequency": "daily",
"frequency_start_date": "2024-01-01",
"patterns": {
"frequency_count": "1, 5",
"schedule_type": "weekdays,weekdays",
"day_of_week": "null, null"
},
"comments": null
},
{
"name": "Pattern 3",
"frequency": "daily",
"frequency_start_date": "2021-03-21",
"patterns": {
"frequency_count": "1, 2",
"schedule_type": "weekdays,weekdays",
"day_of_week": "null, null"
},
"comments": null
}
]
But this is what I want:
[
{
"name": "Pattern 1",
"frequency": "daily",
"frequency_start_date": "2022-02-01",
"patterns": {
"frequency_count": "1",
"schedule_type": "weekdays",
"day_of_week": null
},
"comments": null
},
{
"name": "Pattern 2",
"frequency": "daily",
"frequency_start_date": "2024-01-01",
"patterns": {
"frequency_count": 1,
"schedule_type": "weekdays",
"day_of_week": null
},
{
"frequency_count": 5,
"schedule_type": "weekdays",
"day_of_week": null
},
"comments": null
},
{
"name": "Pattern 3",
"frequency": "daily",
"frequency_start_date": "2021-03-21",
"patterns": {
"frequency_count": 1,
"schedule_type": "weekdays",
"day_of_week": null
},
{
"frequency_count": 2,
"schedule_type": "weekdays",
"day_of_week": null
},
"comments": null
}
]
This is where I am stuck. Is this an issue with csv into dataframe? Or dataframe into json? It seems that the columns where there are multiple values are read as strings. I read questions posted at StackOverflow but none of them actually had this issue or answers did not help me to solve this.
Any help would be greatly appreciated.
Thanks,
Beta
2
Answers
Thanks to @BeRT2me, the final code for this is the following:
The output is:
]
Given:
Doing:
patterns
column and fix theComments
column:Output: