skip to Main Content

I have the following nested dictionary with lists inside:

[{
    "id": 467,
    "status": 2,
    "leavePeriod": {
        "owner": {
            "employeeNumber": "2620",
            "firstName": "fn_467",
            "lastName": "ln_467"
        },
        "ownerId": 46,
        "leaves": [
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-04-06T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-04-06T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-04-07T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-04-07T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-04-11T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-04-11T00:00:00"
            }
        ]
    }
},
{
    "id": 477,
    "status": 2,
    "leavePeriod": {
        "owner": {
            "employeeNumber": "2522",
            "firstName": "fn_477",
            "lastName": "lm_477"
        },
        "ownerId": 41,
        "leaves": [
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-03-13T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1121,
                    "name": "Vacation days 2021/2022",
                    "url": "https://some_link/1121"
                },
                "date": "2023-03-13T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-14T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-14T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-15T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-15T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-16T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-16T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-17T00:00:00"
            },
            {
                "leaveAccount": {
                    "id": 1323,
                    "name": "RTT 2023",
                    "url": "https://some_link/1323"
                },
                "date": "2023-03-17T00:00:00"
            }
        ]
    }
}]

How can I flatten it and convert it to a dataframe using this library?

from pandas.io.json import json_normalize

I’d like this to be done in one step if possible, instead of flattening to a dataframe twice and then merging them
Thanks in advance.

2

Answers


  1. Just use pd.json_normalize(<put_json_here>).

    Login or Signup to reply.
  2. Using pandas.json_normalize

    Note: from pandas.io.json import json_normalize results in the same output on my machine but raises a FutureWarning.

    Data

    import pandas as pd  # 1.5.1
    
    
    data = [{
        "id": 467,
        "status": 2,
        "leavePeriod": {
            "owner": {
                "employeeNumber": "2620",
                "firstName": "fn_467",
                "lastName": "ln_467"
            },
            "ownerId": 46,
            "leaves": [
                {
                    "leaveAccount": {
                        "id": 1121,
                        "name": "Vacation days 2021/2022",
                        "url": "https://some_link/1121"
                    },
                    "date": "2023-04-06T00:00:00"
                },
                {
                    "leaveAccount": {
                        "id": 1121,
                        "name": "Vacation days 2021/2022",
                        "url": "https://some_link/1121"
                    },
                    "date": "2023-04-06T00:00:00"
                },
                {
                    "leaveAccount": {
                        "id": 1121,
                        "name": "Vacation days 2021/2022",
                        "url": "https://some_link/1121"
                    },
                    "date": "2023-04-07T00:00:00"
                },
                {
                    "leaveAccount": {
                        "id": 1121,
                        "name": "Vacation days 2021/2022",
                        "url": "https://some_link/1121"
                    },
                    "date": "2023-04-07T00:00:00"
                },
                {
                    "leaveAccount": {
                        "id": 1121,
                        "name": "Vacation days 2021/2022",
                        "url": "https://some_link/1121"
                    },
                    "date": "2023-04-11T00:00:00"
                },
                {
                    "leaveAccount": {
                        "id": 1121,
                        "name": "Vacation days 2021/2022",
                        "url": "https://some_link/1121"
                    },
                    "date": "2023-04-11T00:00:00"
                }
            ]
        }
    },
    {
        "id": 477,
        "status": 2,
        "leavePeriod": {
            "owner": {
                "employeeNumber": "2522",
                "firstName": "fn_477",
                "lastName": "lm_477"
            },
            "ownerId": 41,
            "leaves": [
                {
                    "leaveAccount": {
                        "id": 1121,
                        "name": "Vacation days 2021/2022",
                        "url": "https://some_link/1121"
                    },
                    "date": "2023-03-13T00:00:00"
                },
                {
                    "leaveAccount": {
                        "id": 1121,
                        "name": "Vacation days 2021/2022",
                        "url": "https://some_link/1121"
                    },
                    "date": "2023-03-13T00:00:00"
                },
                {
                    "leaveAccount": {
                        "id": 1323,
                        "name": "RTT 2023",
                        "url": "https://some_link/1323"
                    },
                    "date": "2023-03-14T00:00:00"
                },
                {
                    "leaveAccount": {
                        "id": 1323,
                        "name": "RTT 2023",
                        "url": "https://some_link/1323"
                    },
                    "date": "2023-03-14T00:00:00"
                },
                {
                    "leaveAccount": {
                        "id": 1323,
                        "name": "RTT 2023",
                        "url": "https://some_link/1323"
                    },
                    "date": "2023-03-15T00:00:00"
                },
                {
                    "leaveAccount": {
                        "id": 1323,
                        "name": "RTT 2023",
                        "url": "https://some_link/1323"
                    },
                    "date": "2023-03-15T00:00:00"
                },
                {
                    "leaveAccount": {
                        "id": 1323,
                        "name": "RTT 2023",
                        "url": "https://some_link/1323"
                    },
                    "date": "2023-03-16T00:00:00"
                },
                {
                    "leaveAccount": {
                        "id": 1323,
                        "name": "RTT 2023",
                        "url": "https://some_link/1323"
                    },
                    "date": "2023-03-16T00:00:00"
                },
                {
                    "leaveAccount": {
                        "id": 1323,
                        "name": "RTT 2023",
                        "url": "https://some_link/1323"
                    },
                    "date": "2023-03-17T00:00:00"
                },
                {
                    "leaveAccount": {
                        "id": 1323,
                        "name": "RTT 2023",
                        "url": "https://some_link/1323"
                    },
                    "date": "2023-03-17T00:00:00"
                }
            ]
        }
    }]
    

    Option 1

    # write out your meta data columns and record path
    df = pd.json_normalize(
        data=data,
        meta=[
            "id",
            "status",
            ["leavePeriod", "owner", "employeeNumber"],
            ["leavePeriod", "owner", "firstName"],
            ["leavePeriod", "owner", "lastName"],
            ["leavePeriod", "ownerId"]
        ],
        record_path=["leavePeriod", "leaves"]
    )
    

    Or if you’re lazy and don’t wanna split all the meta columns up by hand…

    df = pd.json_normalize(
        data=data,
        meta=[
            "id",
            "status",
            "leavePeriod.owner.employeeNumber".split("."),
            "leavePeriod.owner.firstName".split("."),
            "leavePeriod.owner.lastName".split("."),
            "leavePeriod.ownerId".split("."),
        ],
        record_path="leavePeriod.leaves".split(".")
    )
    
    df.sample(5)
    

    json_normalize out 1
    json_normalize out 2

    Parameter Descriptions + Explanations

    • record_path: str or list of str, default None

      Path in each object to list of records. If not passed, data will be assumed to be an array of records.

    I set record_path to "leavePeriod.leaves".split(".") because that was the (nested) key(s) in the dictionary that held another layer of records, i.e. array/list of dictionaries. Leaving it blank leads pandas to assume that data is an array of records and has no further nesting.

    • meta: list of paths (str or list of str), default None

      Fields to use as metadata for each record in resulting table.

    If you set record_path without setting meta pandas will only normalize and extract the record_path. To keep the upper-level (i.e. not-so-nested) keys, we need to specify them as meta data.

    Option 2

    df = pd.json_normalize(data)
    
    df
    

    output1

    To "flatten" the leavePeriod.leaves even further…

    # explode the column and ignore index
    df = df.explode(column="leavePeriod.leaves", ignore_index=True)
    
    df.sample(5)  # taking a sample to highlight what happened
    

    exploded output

    # convert df["leavePeriod.leaves"] to a list and use `pd.json_normalize` again
    leavePeriod = pd.json_normalize(df["leavePeriod.leaves"].to_list())
    
    leavePeriod.sample(5)
    

    leavePeriod output

    # join `leavePeriod` with `df` 
    # and drop old "leavePeriod.leaves" columns
    df = (
        df.join(leavePeriod)
          .drop(columns="leavePeriod.leaves")
    )
    
    df.sample(5)
    

    Note: you can’t see all the columns in the image because it’s too wide so I have two images

    joined output 1
    joined output 2

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search