skip to Main Content

I am trying to replace a value in a pandas dataframe cell that I am constructing from a JSON file that I get from Airtable API the JSON file looks like this one:

[
    {
        "id": "rec0Y7ydscZnU5NjZ",
        "createdTime": "2023-03-17T16:32:57.000Z",
        "fields": {
            "field_id": [
                "recq6vsFWI2H8j84i"
            ],
            "field_name": "S-166",
            "currency": [
                "Dolares"
            ],
            "state": [
                5
            ],
            "roi": {
                "specialValue": "Infinity"
            },
            "investment_return": {
                "specialValue": "NaN"
            },
            "value_amount": 16559.51178956216,
            "income": [
                31978.5
            ]
        }       
    }
]

Obviously this is a sample and I am replacing the real keys and some values, because they are from a private application, but what I am trying to do once I get the API response and then convert the JSON into a dataframe using the code below, is to replace the values {"specialValue": "Infinity"} from roi key and {"specialValue": "NaN"} from investment_return key with zeros:

#read_json.py
def read_json():
    # Open the JSON file
    with open('json_test.json', 'r') as file:
    # Parse the JSON data
        data = json.load(file)

        # Extract all fields from the "fields" dictionary and create a DataFrame
        df1 = pd.DataFrame([record['id'] for record in data],columns=['id'])
        df2 = pd.DataFrame([record['fields'] for record in data])
        
        df = pd.concat([df1,df2], axis=1)
        
    return df

Then once I have the JSON data converted into a dataframe I am trying to replace the values using this code:

df.replace('{"specialValue": "Infinity"}', np.nan, inplace=True)

For some reason is not working:

Here I am using the read_json() function I described above to generate the dataframe from the JSON file gotten from API response, and inspecting the resulting dataframe:

df = read_json()
print(df)
                  id             field_id field_name   currency state  
0  rec0Y7ydscZnU5NjZ  [recq6vsFWI2H8j84i]      S-166  [Dolares]   [5]   

                            roi        investment_return  value_amount  
0  {'specialValue': 'Infinity'}  {'specialValue': 'NaN'}   16559.51179   

      income  
0  [31978.5] 

Then once I want to replace() for example the string {"specialValue": "Infinity"} by NaN with numpy:

# replace string value with NaN
df.replace('{"specialValue": "Infinity"}', np.nan, inplace=True)
print(df)

what I would expect is to get for the column roi an nan value, but I am still getting the same string that I receive from the API:

0  rec0Y7ydscZnU5NjZ  [recq6vsFWI2H8j84i]      S-166  [Dolares]   [5]   

                            roi        investment_return  value_amount  
0  {'specialValue': 'Infinity'}  {'specialValue': 'NaN'}   16559.51179   

      income  
0  [31978.5]  

Can somebody give me a possible approach on how to convert those into NaN or zeros?

2

Answers


  1. After parsing your JSON, you no longer have strings, but objects.

    In particular, you can check that:

    type(df.at[0, 'investment_return'])
    # dict
    

    You could take advantage of the mutability of dictionaries and of the map method (applymap before pandas 2.1):

    def replace_in_dict(x):
        if isinstance(x, dict) and 'specialValue' in x and x['specialValue'] == 'Infinity':
            x['specialValue'] = 'NaN'
    
    df.map(replace_in_dict)
    
    print(df)
    

    Modified df:

                      id             field_id field_name   currency state                      roi        investment_return  value_amount     income
    0  rec0Y7ydscZnU5NjZ  [recq6vsFWI2H8j84i]      S-166  [Dolares]   [5]  {'specialValue': 'NaN'}  {'specialValue': 'NaN'}   16559.51179  [31978.5]
    
    Login or Signup to reply.
  2. Easier to change the values in the dictionary before constructing the DataFrame.

    For example:

    import json
    
    data = '''
    [
        {
            "id": "rec0Y7ydscZnU5NjZ",
            "createdTime": "2023-03-17T16:32:57.000Z",
            "fields": {
                "field_id": [
                    "recq6vsFWI2H8j84i"
                ],
                "field_name": "S-166",
                "currency": [
                    "Dolares"
                ],
                "state": [
                    5
                ],
                "roi": {
                    "specialValue": "Infinity"
                },
                "investment_return": {
                    "specialValue": "NaN"
                },
                "value_amount": 16559.51178956216,
                "income": [
                    31978.5
                ]
            }       
        }
    ]
    '''
    
    for datum in json.loads(data):
        fields = datum.get("fields", {})
        for k in "roi", "investment_return":
            dk = fields.get(k, {})
            if dk.get("specialValue") in {"Infinity", "NaN"}:
                dk["specialValue"] = 0
        print(json.dumps(datum, indent=2))
    

    Output:

    {
      "id": "rec0Y7ydscZnU5NjZ",
      "createdTime": "2023-03-17T16:32:57.000Z",
      "fields": {
        "field_id": [
          "recq6vsFWI2H8j84i"
        ],
        "field_name": "S-166",
        "currency": [
          "Dolares"
        ],
        "state": [
          5
        ],
        "roi": {
          "specialValue": 0
        },
        "investment_return": {
          "specialValue": 0
        },
        "value_amount": 16559.51178956216,
        "income": [
          31978.5
        ]
      }
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search