skip to Main Content

I’ve tried using both Python and Postgres to convert a table of data into JSON that can be used to render a nested dropdown in a web UI.

Here is how the data is structured:

group_desc category_desc subcategory_desc item_desc
Group C Category C Subcategory A Item H
Group C Category C Subcategory A Item I
Group C Category C Subcategory A Item J
Group C Category C Subcategory B Item K
Group C Category C Subcategory B Item L
Group D Category D Subcategory C Item M
Group D Category D Subcategory C Item N

Here is how I’d like the JSON to appear:

{
    'group': [{
        group_desc: 'Group C',
        'categories': [{
            'category_desc': 'Category C',
            'subcategories': [{
                    'subcategory_desc': 'Subcategory A',
                    'items': [
                        'item_desc': 'Item H',
                        'item_desc': 'Item I',
                        'item_desc': 'Item J',
                    ]
                },
                {
                    'subcategory_desc': 'Subcategory B',
                    'items': [
                        'item_desc': 'Item K',
                        'item_desc': 'Item L',
                    ] 
                }]
            }]
        },
    {
        'group_desc': 'Group D',
        'categories': [{
            'category_desc': 'Category D',
            'subcategories': [{
                    'subcategory_desc': 'Subcategory D',
                    'items': [
                        'item_desc': 'Item M',
                        'item_desc': 'Item N',
                    ]
                }]
            }]
    }]
}

In essence, for each group of records, translate it to an array of objects (or a list of dictionaries). Thank you!

I’ve tried using DataFrameGroupBy with Python. I’ve also tried json_agg and json_build_object using Postgres. The closest that I was able to get is only a list of dictionaries without the categories and subcategories in a list.

Here is the code in Python using a DataFrame structured like the table above:


import pandas as pd

## PREPARE DATAFRAME
item_dict = {
    'group_desc': ['Group C', 'Group C', 'Group C', 'Group C', 'Group C', 'Group D', 'Group D'],
    'category_desc': ['Category C', 'Category C', 'Category C', 'Category C', 'Category C', 'Category D', 'Category D'],
    'subcategory_desc': ['Subcategory A', 'Subcategory A', 'Subcategory A', 'Subcategory B', 'Subcategory B', 'Subcategory C', 'Subcategory C'],
    'item_desc': ['Item H', 'Item I', 'Item J', 'Item K', 'Item L', 'Item M', 'Item N'] 
}

hierarchy_df = pd.DataFrame(item_dict)

hierarchy_df.head()

## TRY TO CREATE THE JSON

records = []

group_list = []
category_list = []
subcategory_list = []

prev_group = ''
prev_category = ''
prev_subcategory = ''

for ix, row in hierarchy_df.iterrows():   
    
    # initialize dict 
    entry = {}
    
    # pull item group
    group_desc = row['group_desc']
    category_desc = row['category_desc']
    subcategory_desc = row['subcategory_desc']
    desc = row['item_desc']
    
    # reset list
    if((prev_subcategory != subcategory_desc) | (prev_category != category_desc) | (prev_group != group_desc)):
        group_list = []
        category_list = []
        subcategory_list = []

    # check for category
    if(category_desc) is not None:

        # check for subcategory
        if(subcategory_desc) is not None:
            
            subcategory_list.append(desc)
            
            entry['group'] = {
                'group_desc': group_desc,
                'category': {
                    'category_desc': category_desc,
                    'subcategory': {
                        'subscategory_desc': subcategory_desc,
                        'item': {
                            'item_desc': subcategory_list
                        }
                    }
                }
            }
        # if no subcategory
        else:
            
            category_list.append(desc)
            
            entry['group'] = {
                'group_desc': group_desc,
                'category': {
                    'category_desc': category_desc,
                        'item': {
                            'item_desc': category_list
                        }
                    }
                }
    # if no category
    else:
        group_list.append(desc)
   
        entry['group'] = {
            'group_desc': group_desc,
                'item': {
                    'desc': group_list
                    }
                }

    records.append(entry)
    
    prev_group = row['group_desc']
    prev_category = row['category_desc']
    prev_subcategory = row['subcategory_desc']

    
# dedupe (temp workaround)
records = [i for n, i in enumerate(records)
            if i not in records[n + 1:]]

# preview result
records

2

Answers


  1. Chosen as BEST ANSWER

    Thanks to @jqurious, I was able to come up with this solution. This helped nest all of the related data into the desired JSON format.

    Thanks everyone!

    final_json = {'group': []}
    
    for group_name, group_df in df.groupby('group_desc'):
        group_data = {'group_desc': group_name, 'categories': []}
        
        for category_name, category_df in group_df.groupby('category_desc'):
            category_data = {'category_desc': category_name, 'subcategories': []}
            
            for subcategory_name, subcategory_df in category_df.groupby('subcategory_desc'):
                subcategory_data = {'subcategory_desc': subcategory_name, 'items': []}
                
                for item in subcategory_df['item_desc']:
                    subcategory_data['items'].append({'item_desc': item})
                
                category_data['subcategories'].append(subcategory_data)
            
            group_data['categories'].append(category_data)
        
        final_json['group'].append(group_data)
    

  2. There are probably better ways of doing this but as for your current approach, this may be helpful.

    Instead of trying to create the final output in one pass, you can first create a nested dict using the values as the keys, and only create a list at the last level:

    out['Group A']['Category C']['Subcategory A'] = [ ... ]
    
    out = {}
    
    for group, *categories, subcategory, item in df.itertuples(index=None, name=None):
        group = out.setdefault(group, {})
        for category in categories:
           group = group.setdefault(category, {})
        group.setdefault(subcategory, []).append({'item_desc': item})
    
    {'Group C': {'Category C': {'Subcategory A': [{'item_desc': 'Item H'},
        {'item_desc': 'Item I'},
        {'item_desc': 'Item J'}],
       'Subcategory B': [{'item_desc': 'Item K'}, {'item_desc': 'Item L'}]}},
     'Group D': {'Category D': {'Subcategory C': [{'item_desc': 'Item M'},
        {'item_desc': 'Item N'}]}}}
    

    You can then add the headers/lists back in to the higher levels:

    [      
       {
          'group_desc':  group, 
          'categories': [
             { 
                'category_desc': category,
                'subcategories': [  
                   {
                       'subcategory_desc': subcategory,
                       'items': items
                   }
                   for subcategory, items in subcategories.items()
                ]
             }
             for category, subcategories in categories.items()
          ]
       }
       for group, categories in out.items()
    ]
    

    Output:

    [{'group_desc': 'Group C',
      'categories': [{'category_desc': 'Category C',
        'subcategories': [{'subcategory_desc': 'Subcategory A',
          'items': [{'item_desc': 'Item H'},
           {'item_desc': 'Item I'},
           {'item_desc': 'Item J'}]},
         {'subcategory_desc': 'Subcategory B',
          'items': [{'item_desc': 'Item K'}, {'item_desc': 'Item L'}]}]}]},
     {'group_desc': 'Group D',
      'categories': [{'category_desc': 'Category D',
        'subcategories': [{'subcategory_desc': 'Subcategory C',
          'items': [{'item_desc': 'Item M'}, {'item_desc': 'Item N'}]}]}]}]
    

    Thank you! Here is the final code I used.

    Dart’s edit to the comment. Your idea was very helpful! I wound up using a nested loop based on your logic! Here is the code:

    final_json = {'group': []}
    
    for group_name, group_df in df.groupby('group_desc'):
        group_data = {'group_desc': group_name, 'categories': []}
        
        for category_name, category_df in group_df.groupby('category_desc'):
            category_data = {'category_desc': category_name, 'subcategories': []}
            
            for subcategory_name, subcategory_df in category_df.groupby('subcategory_desc'):
                subcategory_data = {'subcategory_desc': subcategory_name, 'items': []}
                
                for item in subcategory_df['item_desc']:
                    subcategory_data['items'].append({'item_desc': item})
                
                category_data['subcategories'].append(subcategory_data)
            
            group_data['categories'].append(category_data)
        
        final_json['group'].append(group_data)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search