I have below csv format. I want it to convert some nested dict.
name,columns,tests
ABC_ESTIMATE_REFINED,cntquota,dbt_expectations.expect_column_to_exist
ABC_ESTIMATE_REFINED,cntquota,not_null
ABC_ESTIMATE_REFINED,is_purged,dbt_expectations.expect_column_to_exist
ABC_ESTIMATE_REFINED,is_purged,not_null
Expected Output
{
"name": "ABC_ESTIMATE_REFINED",
"columns": [
{
"name": "cntquota",
"tests": [
"dbt_expectations.expect_column_to_exist",
"not_null"
]
},
{
"name": "is_purged",
"tests": [
"dbt_expectations.expect_column_to_exist",
"not_null"
]
}
]
}
my attempt is below , but not reaching even close to it.
df=pd.read_csv('data.csv')
print(df)
nested_dict = df.groupby(['name','columns']).apply(lambda x: x[['tests']].to_dict(orient='records')).to_dict()
print(nested_dict)
3
Answers
IIUC, you can use nested
groupby
calls:Since the processing occurs by pairs or columns, you could also imagine a recursive approach:
Output:
Demo of the recursive approach with a different order of the keys:
Something like:
Seems to do the job for the expected output in your OP.
Indeed if there is multiple modalities of
name
you will have to store it as a list of dictionary instead:Which will render: