skip to Main Content

I have a json dataset where each item/index can contain 2 nested dictionaries. The problem is that one of these nested dictionaries contains all of the exact key:value pairs as its parent dictionary. To put it in other words, I have a parent "Account" and any time there are "Sub-Accounts" it places the Sub-Accounts in the nested dictionary, and they are never seen as their own standalone item/index.

Here is the sample json of one item/index. Essentially, I need the sub_accounts object to extracted and become its own index. As you can see, it contains all of the same key:value objects as the parent containins the sub_accounts.

    {
        "classification": [
            {
                "classificationId": "Cash",
                "taxonomyId": "accounting.gp"
            }
        ],
        "id": "235",
        "kind": "Real",
        "name": "Checking",
        "sub_accounts": [
            {
                "classification": [
                    {
                        "classificationId": "Cash",
                        "taxonomyId": "accounting.gp"
                    }
                ],
                "id": "236",
                "kind": "Real",
                "name": "Cash Reserve",
                "sub_accounts": []
            }
        ]
    },

I have been able to use json_normalize or even variations of .pop() to accomplish a flattening of data and I have tried to explore other flattening options, but with no luck on the specific task I am trying to accomplish. Those solutions usually just result with the subaccounts still be associated to the original index.

2

Answers


  1. You could use a recursive function to traverse the hierarchy while progressively popping out the "sub_accounts" keys:

    def extractAccounts(accounts):
        return [s for a in accounts 
                  for s in (a,*extractAccounts(a.pop("sub_accounts",[])))]
    

    From a list of account objects:

    data =  [{
            "classification": [
                {
                    "classificationId": "Cash",
                    "taxonomyId": "accounting.gp"
                }
            ],
            "id": "235",
            "kind": "Real",
            "name": "Checking",
            "sub_accounts": [
                {
                    "classification": [
                        {
                            "classificationId": "Cash",
                            "taxonomyId": "accounting.gp"
                        }
                    ],
                    "id": "236",
                    "kind": "Real",
                    "name": "Cash Reserve",
                    "sub_accounts": []
                }
            ]
        }]
    

    Output:

    accounts = extractAccounts(data)
    for i,account in enumerate(accounts):
        print("Account #",i)
        print(account)
    
    
    Account # 0
    {'classification': [{'classificationId': 'Cash', 'taxonomyId': 'accounting.gp'}], 'id': '235', 'kind': 'Real', 'name': 'Checking'}
    Account # 1
    {'classification': [{'classificationId': 'Cash', 'taxonomyId': 'accounting.gp'}], 'id': '236', 'kind': 'Real', 'name': 'Cash Reserve'}
    

    If your top level is a single account (i.e. not a list), just place it in a list when calling the function: extractAccount([data])

    Login or Signup to reply.
  2. I don’t have a generic answer, but this seems to do what you need:

    raw_data = """                                                                                                                                                                                 
    [                                                                                                                                                                                              
        {                                                                                                                                                                                          
            "classification": [                                                                                                                                                                    
                {                                                                                                                                                                                  
                    "classificationId": "Cash",                                                                                                                                                    
                    "taxonomyId": "accounting.gp"                                                                                                                                                  
                }                                                                                                                                                                                  
            ],                                                                                                                                                                                     
            "id": "235",                                                                                                                                                                           
            "kind": "Real",                                                                                                                                                                        
            "name": "Checking",                                                                                                                                                                    
            "sub_accounts": [                                                                                                                                                                      
                {                                                                                                                                                                                  
                    "classification": [                                                                                                                                                            
                        {                                                                                                                                                                          
                            "classificationId": "Cash",                                                                                                                                            
                            "taxonomyId": "accounting.gp"                                                                                                                                          
                        }                                                                                                                                                                          
                    ],                                                                                                                                                                             
                    "id": "236",                                                                                                                                                                   
                    "kind": "Real",                                                                                                                                                                
                    "name": "Cash Reserve",                                                                                                                                                        
                    "sub_accounts": []                                                                                                                                                             
                }                                                                                                                                                                                  
            ]                                                                                                                                                                                      
        }                                                                                                                                                                                          
    ]                                                                                                                                                                                              
    """                                                                                                                                                                                            
                                                                                                                                                                                                   
    import json                                                                                                                                                                                    
    jdict = json.loads(raw_data)                                                                                                                                                                   
                                                                                                                                                                                                   
    empty_list = list()                                                                                                                                                                            
    result     = list()                                                                                                                                                                            
    for elem in jdict:                                                                                                                                                                             
        sub_elem_list = elem['sub_accounts']                                                                                                                                                       
        elem['sub_accounts'] = empty_list                                                                                                                                                          
        result.append(elem)                                                                                                                                                                        
        for sub_elem in sub_elem_list:                                                                                                                                                             
            result.append(sub_elem)                                                                                                                                                                
                                                                                                                                                                                                   
    print(json.dumps(result, indent=4)) 
    
    output = """
    [
        {
            "classification": [
                {
                    "classificationId": "Cash",
                    "taxonomyId": "accounting.gp"
                }
            ],
            "id": "235",
            "kind": "Real",
            "name": "Checking",
            "sub_accounts": []
        },
        {
            "classification": [
                {
                    "classificationId": "Cash",
                    "taxonomyId": "accounting.gp"
                }
            ],
            "id": "236",
            "kind": "Real",
            "name": "Cash Reserve",
            "sub_accounts": []
        }
    ]
    
    """
    

    When you have nested structures you need to nest your loops. The other answer has recursion, which can cause problems if you’re nesting over a thousand recursive calls (so probably not this case). I also assumed that you care about order, preferring the parent’s id to be first. Also, if you’re trying to get rid of the sub_accounts from the json, then you’d want to pop it from the records, but I again assume that the structure should be maintained.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search