skip to Main Content

It should be a simple line of code using pd.json_normalize function but it’s working only with a single string and it’s not batch processing my whole column

Orginial dataframe

enter image description here

df['addresses'][0]

[{'addressLine1': '124 Main Street',
  'addressLine2': '',
  'addressLine3': '',
  'city': 'Portland',
  'region': 'ME',
  'postalCode': '04019',
  'country': 'USA'}]
test = pd.json_normalize(result['addresses'][0])
test

enter image description here

Everything up to this point works, but when I use the function and apply to the whole column, the resulting dataframe turned out to look like this.

test = pd.json_normalize(result['addresses'])
test

enter image description here

Here are some column data:

[[{'addressLine1': '124 Main Street',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}],
 [{'addressLine1': '1234 Main Street',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Chattanooga',
   'region': 'TN',
   'postalCode': '37402',
   'country': 'USA'}],
 [{'addressLine1': '1684151 Chair Street',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Notaplace',
   'region': 'AL',
   'postalCode': '48835',
   'country': 'USA'}],
 [{'addressLine1': '136 Main Street',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '22118',
   'country': 'USA'}],
 [{'addressLine1': '123452 HoneyDo LN',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}],
 [{'addressLine1': '123 Main Street',
   'addressLine2': 'Apt 2B',
   'addressLine3': 'Building B',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}],
 [{'addressLine1': '123 Main Street',
   'addressLine2': 'Apt 2B',
   'addressLine3': 'Building B',
   'city': 'New York City',
   'region': 'NY',
   'postalCode': '10001',
   'country': 'USA'}],
 [{'addressLine1': '123 Main Street',
   'addressLine2': 'Apt 2B',
   'addressLine3': 'Building B',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}],
 [{'addressLine1': '4578 Shiver Me Timbers Road',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}],
 [{'addressLine1': '124 Main ST',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'PORTLAND',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}]]

2

Answers


  1. If I understand you correctly, you can transform your dataframe df with dict data with following example:

    df = pd.concat([df, df.pop('addresses').str[0].apply(pd.Series)], axis=1)
    print(df)
    

    Prints:

                      addressLine1 addressLine2 addressLine3           city region postalCode country
    0              124 Main Street                                 Portland     ME      04019     USA
    1             1234 Main Street                              Chattanooga     TN      37402     USA
    2         1684151 Chair Street                                Notaplace     AL      48835     USA
    3              136 Main Street                                 Portland     ME      22118     USA
    4            123452 HoneyDo LN                                 Portland     ME      04019     USA
    5              123 Main Street       Apt 2B   Building B       Portland     ME      04019     USA
    6              123 Main Street       Apt 2B   Building B  New York City     NY      10001     USA
    7              123 Main Street       Apt 2B   Building B       Portland     ME      04019     USA
    8  4578 Shiver Me Timbers Road                                 Portland     ME      04019     USA
    9                  124 Main ST                                 PORTLAND     ME      04019     USA
    
    Login or Signup to reply.
  2. It seems your list has one-element lists as elements.

    Lets say your list is address_list then you get the first element in that list and then use json_normalize

    pd.json_normalize([e[0] for e in address_list])
    

    If the test data that you posted is actually a column then just use:

    pd.json_normalize(result["addresses"].str[0])
    

    Or if you have other columns in addition to addresses in your result dataframe:

    pd.concat(
        [result.drop(column="addresses"), pd.json_normalize(result["addresses"].str[0])],
        axis=1
    )
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search