skip to Main Content

Given the following Pandas DataFrame (the original DataFrame has 200+ rows):

import pandas as pd
df = pd.DataFrame({
    'child': ['Europe', 'France', 'Paris','North America', 'US', 'Canada'],
    'parent': ["", 'Europe', 'France',"", 'North America', 'North America'],
    'value': [746.4, 67.75, 2.16, 579,331.9, 38.25]
})

df

|---+---------------+---------------+--------|
|   | child         | parent        |  value |
|---+---------------+---------------+--------|
| 0 | Europe        |               | 746.40 |
| 1 | France        | Europe        |  67.75 |
| 2 | Paris         | France        |   2.16 |
| 3 | North America |               | 579.00 |
| 4 | US            | North America | 331.90 |
| 5 | Canada        | North America |  38.25 |
|---+---------------+---------------+--------|

I want to generate the following JSON tree:

  [
      {
      name: 'Europe',
      value: 746.4,
      children: [
          {
          name: 'France',
          value: 67.75,
          children: [
              {
              name: 'Paris',
              value: 2.16
              }
          ]
          }
      ]
      },
      {
      name: 'North America',
      value: 579,
      children: [
          {
          name: 'US',
          value: 331.9,
          },
          {
          name: 'Canada',
          value: 38.25
          }
      ]
      }
  ];

This tree will be used as an input for ECharts visualizations, like for example this basic sunburst chart.

3

Answers


  1. You can use the networkx package for this. First convert the dataframe to a graph:

    import networkx as nx
    
    G = nx.from_pandas_edgelist(df, source='parent', target='child', edge_attr='value', create_using=nx.DiGraph)
    nx.draw(G, with_labels=True)
    

    This will result in a weighted graph:
    enter image description here

    Next, we get the graph as a JSON formatted tree:

    from networkx.readwrite import json_graph
    
    data = json_graph.tree_data(G, root='')
    data = data['children']  # remove the root
    

    This will look as follows:

    [{'id': 'Europe',
      'children': [{'id': 'France', 'children': [{'id': 'Paris'}]}]},
     {'id': 'North America', 'children': [{'id': 'US'}, {'id': 'Canada'}]}]
    

    Finally, post-process the JSON data by adding back the values and renaming ‘id’ to ‘name’. Maybe there is a better way of doing this but the below works.

    edge_values = nx.get_edge_attributes(G,'value')
    
    def post_process_json(data, parent=''):
        print(data)
        data['name'] = data.pop('id')
        data['value'] = edge_values[(parent, data['name'])]
        if 'children' in data.keys():
            data['children'] = [post_process_json(child, parent=data['name']) for child in data['children']]
        return data
    
    data = [post_process_json(d) for d in data]
    

    Final result:

    [{'children': [{'children': [{'name': 'Paris', 'value': 2.16}],
        'name': 'France',
        'value': 67.75}],
      'name': 'Europe',
      'value': 746.4},
     {'children': [{'name': 'US', 'value': 331.9},
       {'name': 'Canada', 'value': 38.25}],
      'name': 'North America',
      'value': 579.0}]
    
    Login or Signup to reply.
  2. You could first create the individual nodes as { name, value } dicts, and key them by name. Then link them up:

    result = []
    d = { "": { "children": result } }
    for child, value in zip(df["child"], df["value"]):
        d[child] = { "name": child, "value": value }
    for child, parent in zip(df["child"], df["parent"]):
        if "children" not in d[parent]:
            d[parent]["children"] = []
        d[parent]["children"].append(d[child])
    

    For the example, result would be:

    [{
        'name': 'Europe', 
        'value': 746.4, 
        'children': [{
            'name': 'France', 
            'value': 67.75, 
            'children': [{
                'name': 'Paris', 
                'value': 2.16
            }]
        }]
    }, {
        'name': 'North America', 
        'value': 579.0, 
        'children': [{
            'name': 'US', 
            'value': 331.9
        }, {
            'name': 'Canada', 
            'value': 38.25
        }]
    }]
    
    Login or Signup to reply.
  3. There is a library called bigtree which can do exactly what you are looking for.

    import json
    import bigtree
    
    # Set the parent values for children without parents to ROOT
    df["parent"] = df["parent"].replace(r'^$', "ROOT", regex = True)
    
    tree = bigtree.dataframe_to_tree_by_relation(df, "child", "parent")
    # tree.show(all_attrs = True)
    
    # Convert to dict and discard the ROOT key
    tree_dict = bigtree.tree_to_nested_dict(tree, all_attrs = True)["children"]
    
    # Convert the dict to the desired string format
    print(json.dumps(tree_dict, indent = 2))
    

    Also see: Read data from a pandas DataFrame and create a tree using anytree in python

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search