How to export a weighted edge list to a JSON tree?

crocefisso
January 31, 2024
139 views
2 votes
3 Answers

Given the following Pandas DataFrame (the original DataFrame has 200+ rows):

import pandas as pd
df = pd.DataFrame({
    'child': ['Europe', 'France', 'Paris','North America', 'US', 'Canada'],
    'parent': ["", 'Europe', 'France',"", 'North America', 'North America'],
    'value': [746.4, 67.75, 2.16, 579,331.9, 38.25]
})

df

|---+---------------+---------------+--------|
|   | child         | parent        |  value |
|---+---------------+---------------+--------|
| 0 | Europe        |               | 746.40 |
| 1 | France        | Europe        |  67.75 |
| 2 | Paris         | France        |   2.16 |
| 3 | North America |               | 579.00 |
| 4 | US            | North America | 331.90 |
| 5 | Canada        | North America |  38.25 |
|---+---------------+---------------+--------|

I want to generate the following JSON tree:

  [
      {
      name: 'Europe',
      value: 746.4,
      children: [
          {
          name: 'France',
          value: 67.75,
          children: [
              {
              name: 'Paris',
              value: 2.16
              }
          ]
          }
      ]
      },
      {
      name: 'North America',
      value: 579,
      children: [
          {
          name: 'US',
          value: 331.9,
          },
          {
          name: 'Canada',
          value: 38.25
          }
      ]
      }
  ];

This tree will be used as an input for ECharts visualizations, like for example this basic sunburst chart.

Answers

You can use the networkx package for this. First convert the dataframe to a graph:

import networkx as nx

G = nx.from_pandas_edgelist(df, source='parent', target='child', edge_attr='value', create_using=nx.DiGraph)
nx.draw(G, with_labels=True)

This will result in a weighted graph:

Next, we get the graph as a JSON formatted tree:

from networkx.readwrite import json_graph

data = json_graph.tree_data(G, root='')
data = data['children']  # remove the root

This will look as follows:

[{'id': 'Europe',
  'children': [{'id': 'France', 'children': [{'id': 'Paris'}]}]},
 {'id': 'North America', 'children': [{'id': 'US'}, {'id': 'Canada'}]}]

Finally, post-process the JSON data by adding back the values and renaming ‘id’ to ‘name’. Maybe there is a better way of doing this but the below works.

edge_values = nx.get_edge_attributes(G,'value')

def post_process_json(data, parent=''):
    print(data)
    data['name'] = data.pop('id')
    data['value'] = edge_values[(parent, data['name'])]
    if 'children' in data.keys():
        data['children'] = [post_process_json(child, parent=data['name']) for child in data['children']]
    return data

data = [post_process_json(d) for d in data]

Final result:

[{'children': [{'children': [{'name': 'Paris', 'value': 2.16}],
    'name': 'France',
    'value': 67.75}],
  'name': 'Europe',
  'value': 746.4},
 {'children': [{'name': 'US', 'value': 331.9},
   {'name': 'Canada', 'value': 38.25}],
  'name': 'North America',
  'value': 579.0}]

You could first create the individual nodes as { name, value } dicts, and key them by name. Then link them up:

result = []
d = { "": { "children": result } }
for child, value in zip(df["child"], df["value"]):
    d[child] = { "name": child, "value": value }
for child, parent in zip(df["child"], df["parent"]):
    if "children" not in d[parent]:
        d[parent]["children"] = []
    d[parent]["children"].append(d[child])

For the example, result would be:

[{
    'name': 'Europe', 
    'value': 746.4, 
    'children': [{
        'name': 'France', 
        'value': 67.75, 
        'children': [{
            'name': 'Paris', 
            'value': 2.16
        }]
    }]
}, {
    'name': 'North America', 
    'value': 579.0, 
    'children': [{
        'name': 'US', 
        'value': 331.9
    }, {
        'name': 'Canada', 
        'value': 38.25
    }]
}]

- NightTrain
- January 31, 2024 at 1:43 pm
- 0 votes
0
There is a library called bigtree which can do exactly what you are looking for.
```
import json
import bigtree

# Set the parent values for children without parents to ROOT
df["parent"] = df["parent"].replace(r'^$', "ROOT", regex = True)

tree = bigtree.dataframe_to_tree_by_relation(df, "child", "parent")
# tree.show(all_attrs = True)

# Convert to dict and discard the ROOT key
tree_dict = bigtree.tree_to_nested_dict(tree, all_attrs = True)["children"]

# Convert the dict to the desired string format
print(json.dumps(tree_dict, indent = 2))
```
Also see: Read data from a pandas DataFrame and create a tree using anytree in python
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.