Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Json – Python flatten a dictionary column

YoungZ
March 1, 2023
309 views
0 votes
2 Answers

It should be a simple line of code using pd.json_normalize function but it’s working only with a single string and it’s not batch processing my whole column

Orginial dataframe

df['addresses'][0]

[{'addressLine1': '124 Main Street',
  'addressLine2': '',
  'addressLine3': '',
  'city': 'Portland',
  'region': 'ME',
  'postalCode': '04019',
  'country': 'USA'}]

test = pd.json_normalize(result['addresses'][0])
test

Everything up to this point works, but when I use the function and apply to the whole column, the resulting dataframe turned out to look like this.

test = pd.json_normalize(result['addresses'])
test

Here are some column data:

[[{'addressLine1': '124 Main Street',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}],
 [{'addressLine1': '1234 Main Street',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Chattanooga',
   'region': 'TN',
   'postalCode': '37402',
   'country': 'USA'}],
 [{'addressLine1': '1684151 Chair Street',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Notaplace',
   'region': 'AL',
   'postalCode': '48835',
   'country': 'USA'}],
 [{'addressLine1': '136 Main Street',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '22118',
   'country': 'USA'}],
 [{'addressLine1': '123452 HoneyDo LN',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}],
 [{'addressLine1': '123 Main Street',
   'addressLine2': 'Apt 2B',
   'addressLine3': 'Building B',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}],
 [{'addressLine1': '123 Main Street',
   'addressLine2': 'Apt 2B',
   'addressLine3': 'Building B',
   'city': 'New York City',
   'region': 'NY',
   'postalCode': '10001',
   'country': 'USA'}],
 [{'addressLine1': '123 Main Street',
   'addressLine2': 'Apt 2B',
   'addressLine3': 'Building B',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}],
 [{'addressLine1': '4578 Shiver Me Timbers Road',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'Portland',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}],
 [{'addressLine1': '124 Main ST',
   'addressLine2': '',
   'addressLine3': '',
   'city': 'PORTLAND',
   'region': 'ME',
   'postalCode': '04019',
   'country': 'USA'}]]

Answers

If I understand you correctly, you can transform your dataframe df with dict data with following example:

df = pd.concat([df, df.pop('addresses').str[0].apply(pd.Series)], axis=1)
print(df)

Prints:

                  addressLine1 addressLine2 addressLine3           city region postalCode country
0              124 Main Street                                 Portland     ME      04019     USA
1             1234 Main Street                              Chattanooga     TN      37402     USA
2         1684151 Chair Street                                Notaplace     AL      48835     USA
3              136 Main Street                                 Portland     ME      22118     USA
4            123452 HoneyDo LN                                 Portland     ME      04019     USA
5              123 Main Street       Apt 2B   Building B       Portland     ME      04019     USA
6              123 Main Street       Apt 2B   Building B  New York City     NY      10001     USA
7              123 Main Street       Apt 2B   Building B       Portland     ME      04019     USA
8  4578 Shiver Me Timbers Road                                 Portland     ME      04019     USA
9                  124 Main ST                                 PORTLAND     ME      04019     USA

- SomeDude
- March 1, 2023 at 1:53 am
- 0 votes
0
It seems your list has one-element lists as elements.

Lets say your list is address_list then you get the first element in that list and then use json_normalize
```
pd.json_normalize([e[0] for e in address_list])
```
If the test data that you posted is actually a column then just use:
```
pd.json_normalize(result["addresses"].str[0])
```
Or if you have other columns in addition to addresses in your result dataframe:
```
pd.concat(
    [result.drop(column="addresses"), pd.json_normalize(result["addresses"].str[0])],
    axis=1
)
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.