skip to Main Content

I’m trying to loop through a JSON that has multiple object keys with the same name. I need to grab them and be able to differentiate them accordingly. Below is my JSON, I need to extract the expenses, date_posted & description from each object within the "results" list:

{
    "count": 1,
    "next": null,
    "previous": null,
    "results": [
        {
            "expenses": "1920000.00",
            "dt_posted": "2022-10-20T21:53:30-04:00",
            "lobbying_activities": [
                {
                    "description": "Providing information related to Apple Pay",
                },
                {
                    "description": "Issues related to transparency and government access to data, including H.R. 7072/S. 4373, the Non-Disclosure Order (NDO) Fairness Act",
                },
              ]
        },
        {
            "expenses": "178888.00",
            "dt_posted": "2022-10-15T21:53:30-04:00",
            "lobbying_activities": [
                {
                    "description": "Issues related tothe requirements of E.O. 14028, an Executive Order on Improving the Nation's Cybersecurity Issues related to cybersecurity requirements in H.R. 7900, the National Defense Authorization Act for Fiscal Year 2023",
                },
              ]
        },
    ]
}

My code attempts to loop through the JSON and extract them into a list so i can then proceed further. However, I’m getting th error "unhashable type: 'dict'":

import requests
import pandas as pd

url = "https://api.npoint.io/1cae29b5fc8900f6cc5a"

r = requests.get(url)

df = pd.json_normalize(r.json())

z =[]

for x in df['results']:
    if df['results'][x] == 'expenses':
        z.append(x)
for x in df['results']:
    if df['results'][x] == 'dt_posted':
        z.append(x)
for x in df['results']:
    if df['results'][x] == 'description':
        z.append(x)

My ideal output should hold one dataset containing the first "expenses", "dt_posted" and "description" and then the second dataset holding the "expenses", "dt_posted" and "description" from the second object inside the JSON.

2

Answers


    1. read json obj.
    2. oterate over "results"
    3. get single result obj with 3rd param as first obj from list under lobbying_activities key:
    def json_result_to_row(result: dict)->tuple:
        return result["expenses"], result["dt_posted"], result["lobbying_activities"][0]
    
    Login or Signup to reply.
  1. You should specify record_path as argument for json_normalize:

    df = pd.json_normalize(r.json(), record_path="results")
    

    This will give you a readable df:

         expenses                  dt_posted                                lobbying_activities
    0  1920000.00  2022-10-20T21:53:30-04:00  [{'description': 'Providing information relate...
    1   178888.00  2022-10-15T21:53:30-04:00  [{'description': 'Issues related tothe require...
    

    Edit: at this point you can export your df to json:

    df.to_json(indent=4, orient="records")
    

    Output:

    [
        {
            "expenses":"1920000.00",
            "dt_posted":"2022-10-20T21:53:30-04:00",
            "lobbying_activities":[
                {
                    "description":"Providing information related to Apple Pay"
                },
                {
                    "description":"Issues related to transparency and government access to data, including H.R. 7072/S. 4373, the Non-Disclosure Order (NDO) Fairness Act"
                }
            ]
        },
        {
            "expenses":"178888.00",
            "dt_posted":"2022-10-15T21:53:30-04:00",
            "lobbying_activities":[
                {
                    "description":"Issues related tothe requirements of E.O. 14028, an Executive Order on Improving the Nation's Cybersecurity Issues related to cybersecurity requirements in H.R. 7900, the National Defense Authorization Act for Fiscal Year 2023"
                }
            ]
        }
    ]
    

    Now since you can have more than one description per row, you can explode the "lobbying_activities" column and concat the first two columns with a new df made of "lobbying_activities":

    df = df.explode("lobbying_activities").reset_index()
    df = pd.concat([
        df[["expenses", "dt_posted"]],
        pd.DataFrame(df["lobbying_activities"].values.tolist())
        ], axis=1)
    

    Output:

         expenses                  dt_posted                                        description
    0  1920000.00  2022-10-20T21:53:30-04:00         Providing information related to Apple Pay
    1  1920000.00  2022-10-20T21:53:30-04:00  Issues related to transparency and government ...
    2   178888.00  2022-10-15T21:53:30-04:00  Issues related tothe requirements of E.O. 1402...
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search