skip to Main Content

I have this data structure:

[
    {
        'field_a': 8, 
        'field_b': 9, 
        'field_c': 'word_a', 
        'field_d': True, 
        'children': [
                        {
                            'field_a': 9, 
                            'field_b': 9, 
                            'field_c': 'word_b', 
                            'field_d': False, 
                            'chilren': [
                                            {
                                                'field_a': 9, 
                                                'field_b': 9, 
                                                'field_c': 'wod_c', 
                                                'field_d': False, 
                                                'chilren': [
                                                           ]
                                            }
                                       ]
                        }
                    ]
    }
]

and I want to keep (for printing purposes) something like this:

[
    {
        'field_c': 'word_a', 
        'children': [
                        {
                            'field_c': 'word_b', 
                            'chilren': [
                                            {
                                                'field_c': 'wod_c', 
                                                'chilren': [
                                                           ]
                                            }
                                       ]
                        }
                    ]
    }
]

What is the most pythonic way to achieve it?

I cannot modify the original data, but I can make a copy of it

2

Answers


  1. You can use a recursive function to traverse the nested structure and create a new dictionary with only the desired fields. Here’s an example:

    import copy
    
    def filter_data(data):
        filtered_data = []
        for item in data:
            new_item = {'field_c': item['field_c']}
            if 'children' in item:
                new_item['children'] = filter_data(item['children'])
            filtered_data.append(new_item)
        return filtered_data
    
    original_data = [...]  # your original data
    filtered_data = filter_data(copy.deepcopy(original_data))
    print(filtered_data)
    
    

    in your example I see you only need specific field and for nesting you can see here

    new_item['children'] = filter_data(item['children'])
    

    we are calling our function recursively.

    I think this is a pydantic approch

    Login or Signup to reply.
  2. Mohiuddin Sumon’s answer may be Pythonic and a pleasure to read, but I’ve often been told, and my experience has also told me, that Python doesn’t like recursives – no, it hates them. Yes, it’s easy to read, but I wanted to make a comparison between recursive and iterative at least for this question, which lends itself very well to it…

    I therefore began by generating data such as those in the question :

    def generate(nb_levels, nb_children, fields):
        if nb_levels == 0:
            return []
        g = generate(nb_levels - 1, nb_children, fields)
        return [
            {
                field: i for field in fields
            } | {'children': g}
            for i in range(nb_children)
        ]
    

    And I’ve modified the approach by adding a parameter to select which fields to keep. And I quickly realized that using deepcopy is quite expensive, and that it’s better not to recopy the data in integrity, especially if there are a lot of initial fields and very few fields to keep.

    So I rewrote the recursive function:

    def filter_data_recursive(data, fields_to_conserve):
        return [
            {field: item[field] for field in fields_to_conserve} | 
            ({'children': filter_data_recursive(item['children'], fields_to_conserve)} 
             if 'children' in item else {}) for item in data
        ]
    

    And I wrote an iterative function :

    def filter_data_iterative(data, fields_to_conserve):
        stack = [(data, None)]
        result = [] 
        mapping = {id(data): result}
    
        while stack:
            current, parent = stack.pop()
            for item in current:
                #Copy of the node itself
                new_item = {}
                for field in fields_to_conserve:
                    new_item[field] = item[field]
                
                #Add the node to his parent
                if parent is None:
                    mapping[id(current)].append(new_item)
                else:
                    parent.append(new_item)
                
                #Link the node to his children
                if 'children' in item:
                    new_item['children'] = []
                    stack.append((item['children'], new_item['children']))
                    mapping[id(item['children'])] = new_item['children']
        
        return result
    

    Without question, the recursive approach is more beautiful and easier to read. Let’s move on to the tests:

    nb_levels nb_children nb_fields nb_fields_kp Iterative time Recursive time
    5 15 5 1 2.550 2.702
    5 15 5 3 2.717 2.960
    5 15 5 5 3.296 3.397
    8 5 5 1 1.568 2.109
    8 5 5 3 1.507 1.917
    8 5 5 5 1.962 2.961
    10 4 5 1 5.228 7.764
    10 4 5 3 6.334 8.363
    10 4 5 5 6.501 8.980
    18 2 5 1 2.580 3.332
    18 2 5 3 2.930 3.689
    18 2 5 5 3.015 3.955
    100 1 5 1 3.060 (10000) 3.241 (10000)
    100 1 5 3 3.661 (10000) 3.894 (10000)
    100 1 5 5 3.805 (10000) 4.491 (10000)
    200 1 5 1 3.203 (5000) 4.242 (5000)
    200 1 5 3 3.829 (5000) 4.314 (5000)
    200 1 5 5 4.227 (5000) 4.652 (5000)
    500 1 5 1 1.694 (1000) 2.322 (1000)
    500 1 5 3 1.672 (1000) 2.293 (1000)
    500 1 5 5 2.072 (1000) 2.981 (1000)
    1000 1 5 1 3.124 (1000) killed
    1000 1 5 3 3.144 (1000) killed
    1000 1 5 5 3.884 (1000) killed
    2000 1 5 1 3.553 (500) killed
    2000 1 5 3 3.720 (500) killed
    2000 1 5 5 3.934 (500) killed

    Unfortunately, my (recursive!!) generator is killed at 3000 levels, so I can’t push the iterative function to its limits. Maybe it was off-topic since the iterative solution failed us…

    So what’s my conclusion? Well, first of all, Mohiuddin Sumon’s solution is undoubtedly the best in terms of readability. But it uses the deepcopy method, which is particularly inefficient and fragile.
    So, if your data is too big and the deepcopy function is killed, use my recursive function, which is less readable but still correct.
    And finally, if your goal is efficiency (… stop Python), the best would be the iterative approach, but here it’s absolutely infamous and unreadable.

    In the end, I feel stupid for having done such tests on a high-level language whose aim is to be readable rather than efficient. But I wanted to try.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search