How to keep only some fields in json of list of json?

glezo
November 24, 2024
126 views
0 votes
2 Answers

I have this data structure:

[
    {
        'field_a': 8, 
        'field_b': 9, 
        'field_c': 'word_a', 
        'field_d': True, 
        'children': [
                        {
                            'field_a': 9, 
                            'field_b': 9, 
                            'field_c': 'word_b', 
                            'field_d': False, 
                            'chilren': [
                                            {
                                                'field_a': 9, 
                                                'field_b': 9, 
                                                'field_c': 'wod_c', 
                                                'field_d': False, 
                                                'chilren': [
                                                           ]
                                            }
                                       ]
                        }
                    ]
    }
]

and I want to keep (for printing purposes) something like this:

[
    {
        'field_c': 'word_a', 
        'children': [
                        {
                            'field_c': 'word_b', 
                            'chilren': [
                                            {
                                                'field_c': 'wod_c', 
                                                'chilren': [
                                                           ]
                                            }
                                       ]
                        }
                    ]
    }
]

What is the most pythonic way to achieve it?

I cannot modify the original data, but I can make a copy of it

Answers

- MohiuddinSumon
- November 24, 2024 at 6:49 am
- 0 votes
0
You can use a recursive function to traverse the nested structure and create a new dictionary with only the desired fields. Here’s an example:
```
import copy

def filter_data(data):
    filtered_data = []
    for item in data:
        new_item = {'field_c': item['field_c']}
        if 'children' in item:
            new_item['children'] = filter_data(item['children'])
        filtered_data.append(new_item)
    return filtered_data

original_data = [...]  # your original data
filtered_data = filter_data(copy.deepcopy(original_data))
print(filtered_data)
```
in your example I see you only need specific field and for nesting you can see here
```
new_item['children'] = filter_data(item['children'])
```
we are calling our function recursively.

I think this is a pydantic approch
Login or Signup to reply.

Mohiuddin Sumon’s answer may be Pythonic and a pleasure to read, but I’ve often been told, and my experience has also told me, that Python doesn’t like recursives – no, it hates them. Yes, it’s easy to read, but I wanted to make a comparison between recursive and iterative at least for this question, which lends itself very well to it…

I therefore began by generating data such as those in the question :

def generate(nb_levels, nb_children, fields):
    if nb_levels == 0:
        return []
    g = generate(nb_levels - 1, nb_children, fields)
    return [
        {
            field: i for field in fields
        } | {'children': g}
        for i in range(nb_children)
    ]

And I’ve modified the approach by adding a parameter to select which fields to keep. And I quickly realized that using deepcopy is quite expensive, and that it’s better not to recopy the data in integrity, especially if there are a lot of initial fields and very few fields to keep.

So I rewrote the recursive function:

def filter_data_recursive(data, fields_to_conserve):
    return [
        {field: item[field] for field in fields_to_conserve} | 
        ({'children': filter_data_recursive(item['children'], fields_to_conserve)} 
         if 'children' in item else {}) for item in data
    ]

And I wrote an iterative function :

def filter_data_iterative(data, fields_to_conserve):
    stack = [(data, None)]
    result = [] 
    mapping = {id(data): result}

    while stack:
        current, parent = stack.pop()
        for item in current:
            #Copy of the node itself
            new_item = {}
            for field in fields_to_conserve:
                new_item[field] = item[field]
            
            #Add the node to his parent
            if parent is None:
                mapping[id(current)].append(new_item)
            else:
                parent.append(new_item)
            
            #Link the node to his children
            if 'children' in item:
                new_item['children'] = []
                stack.append((item['children'], new_item['children']))
                mapping[id(item['children'])] = new_item['children']
    
    return result

Without question, the recursive approach is more beautiful and easier to read. Let’s move on to the tests:

`nb_levels`	`nb_children`	`nb_fields`	`nb_fields_kp`	Iterative time	Recursive time
5	15	5	1	2.550	2.702
5	15	5	3	2.717	2.960
5	15	5	5	3.296	3.397
8	5	5	1	1.568	2.109
8	5	5	3	1.507	1.917
8	5	5	5	1.962	2.961
10	4	5	1	5.228	7.764
10	4	5	3	6.334	8.363
10	4	5	5	6.501	8.980
18	2	5	1	2.580	3.332
18	2	5	3	2.930	3.689
18	2	5	5	3.015	3.955
100	1	5	1	3.060 (10000)	3.241 (10000)
100	1	5	3	3.661 (10000)	3.894 (10000)
100	1	5	5	3.805 (10000)	4.491 (10000)
200	1	5	1	3.203 (5000)	4.242 (5000)
200	1	5	3	3.829 (5000)	4.314 (5000)
200	1	5	5	4.227 (5000)	4.652 (5000)
500	1	5	1	1.694 (1000)	2.322 (1000)
500	1	5	3	1.672 (1000)	2.293 (1000)
500	1	5	5	2.072 (1000)	2.981 (1000)
1000	1	5	1	3.124 (1000)	killed
1000	1	5	3	3.144 (1000)	killed
1000	1	5	5	3.884 (1000)	killed
2000	1	5	1	3.553 (500)	killed
2000	1	5	3	3.720 (500)	killed
2000	1	5	5	3.934 (500)	killed

Unfortunately, my (recursive!!) generator is killed at 3000 levels, so I can’t push the iterative function to its limits. Maybe it was off-topic since the iterative solution failed us…

So what’s my conclusion? Well, first of all, Mohiuddin Sumon’s solution is undoubtedly the best in terms of readability. But it uses the deepcopy method, which is particularly inefficient and fragile.
So, if your data is too big and the deepcopy function is killed, use my recursive function, which is less readable but still correct.
And finally, if your goal is efficiency (… stop Python), the best would be the iterative approach, but here it’s absolutely infamous and unreadable.

In the end, I feel stupid for having done such tests on a high-level language whose aim is to be readable rather than efficient. But I wanted to try.

Please signup or login to give your own answer.

Click here to cancel reply.