I have this data structure:
[
{
'field_a': 8,
'field_b': 9,
'field_c': 'word_a',
'field_d': True,
'children': [
{
'field_a': 9,
'field_b': 9,
'field_c': 'word_b',
'field_d': False,
'chilren': [
{
'field_a': 9,
'field_b': 9,
'field_c': 'wod_c',
'field_d': False,
'chilren': [
]
}
]
}
]
}
]
and I want to keep (for printing purposes) something like this:
[
{
'field_c': 'word_a',
'children': [
{
'field_c': 'word_b',
'chilren': [
{
'field_c': 'wod_c',
'chilren': [
]
}
]
}
]
}
]
What is the most pythonic way to achieve it?
I cannot modify the original data, but I can make a copy of it
2
Answers
You can use a recursive function to traverse the nested structure and create a new dictionary with only the desired fields. Here’s an example:
in your example I see you only need specific field and for nesting you can see here
we are calling our function recursively.
I think this is a pydantic approch
Mohiuddin Sumon’s answer may be Pythonic and a pleasure to read, but I’ve often been told, and my experience has also told me, that Python doesn’t like recursives – no, it hates them. Yes, it’s easy to read, but I wanted to make a comparison between recursive and iterative at least for this question, which lends itself very well to it…
I therefore began by generating data such as those in the question :
And I’ve modified the approach by adding a parameter to select which fields to keep. And I quickly realized that using deepcopy is quite expensive, and that it’s better not to recopy the data in integrity, especially if there are a lot of initial fields and very few fields to keep.
So I rewrote the recursive function:
And I wrote an iterative function :
Without question, the recursive approach is more beautiful and easier to read. Let’s move on to the tests:
nb_levels
nb_children
nb_fields
nb_fields_kp
Unfortunately, my (recursive!!) generator is killed at 3000 levels, so I can’t push the iterative function to its limits. Maybe it was off-topic since the iterative solution failed us…
So what’s my conclusion? Well, first of all, Mohiuddin Sumon’s solution is undoubtedly the best in terms of readability. But it uses the deepcopy method, which is particularly inefficient and fragile.
So, if your data is too big and the deepcopy function is killed, use my recursive function, which is less readable but still correct.
And finally, if your goal is efficiency (… stop Python), the best would be the iterative approach, but here it’s absolutely infamous and unreadable.
In the end, I feel stupid for having done such tests on a high-level language whose aim is to be readable rather than efficient. But I wanted to try.