Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Json – how to create a table from a set of arrays with dictionaries

doriam
March 14, 2023
232 views
2 votes
3 Answers

in the data, column df[‘comments’] consists of 88107 rows with arrays with dictionaries

`
[{'text': "It will be curious to see where this heads in the long run.  CBS is on a tear but will it fit their image, will they try and establish control, overall agenda.  I've enjoyed last.fm for many years supporting through paypal donations each time I expire...it'll be interesting.",
  'score': 0},
 {'text': "Does this mean that there's now a big-name company who will fight for the repeal of the recent streaming-music royalty hike?",
  'score': 1},
 {'text': 'Also on BBC News:  http://news.bbc.co.uk/1/low/technology/6701863.stm .Nice to see a London-based co. hit the headlines.',
  'score': 2},
 {'text': "I don't understand what they do that is worth $70M a year. ",
  'score': 3},
 {'text': 'sold out too cheaply. given their leadership position, they should have ask for at least $500m',
  'score': 4}]
`

how to make one big table with columns ‘text’ and ‘score’

`l=pd.DataFrame()
for i in range(len(df['comments'][0])+1):
    n1=pd.DataFrame(data=df['comments'][i])
    l=pd.concat([l, n1], axis=0)`

I was able to make a table from one array, but I can’t open all 88000
enter image description here

Answers

If for each value of comments is list of dictionaries, need flatten them with DataFrame constructor:

L = [{'text': "It will be curious to see where this heads in the long run.  CBS is on a tear but will it fit their image, will they try and establish control, overall agenda.  I've enjoyed last.fm for many years supporting through paypal donations each time I expire...it'll be interesting.",
  'score': 0},
 {'text': "Does this mean that there's now a big-name company who will fight for the repeal of the recent streaming-music royalty hike?",
  'score': 1},
 {'text': 'Also on BBC News:  http://news.bbc.co.uk/1/low/technology/6701863.stm .Nice to see a London-based co. hit the headlines.',
  'score': 2},
 {'text': "I don't understand what they do that is worth $70M a year. ",
  'score': 3},
 {'text': 'sold out too cheaply. given their leadership position, they should have ask for at least $500m',
  'score': 4}]

df = pd.DataFrame({'comments':[ L, L, L]})

df = pd.DataFrame([y for x in df['comments'] for y in x])
print (df)
                                                 text  score
0   It will be curious to see where this heads in ...      0
1   Does this mean that there's now a big-name com...      1
2   Also on BBC News:  http://news.bbc.co.uk/1/low...      2
3   I don't understand what they do that is worth ...      3
4   sold out too cheaply. given their leadership p...      4
5   It will be curious to see where this heads in ...      0
6   Does this mean that there's now a big-name com...      1
7   Also on BBC News:  http://news.bbc.co.uk/1/low...      2
8   I don't understand what they do that is worth ...      3
9   sold out too cheaply. given their leadership p...      4
10  It will be curious to see where this heads in ...      0
11  Does this mean that there's now a big-name com...      1
12  Also on BBC News:  http://news.bbc.co.uk/1/low...      2
13  I don't understand what they do that is worth ...      3
14  sold out too cheaply. given their leadership p...      4

You can use pd.json_normalize:

>>> pd.concat([df, pd.json_normalize(df['comments'])], axis=1)

                                            comments                                               text  score
0  {'text': 'It will be curious to see where this...  It will be curious to see where this heads in ...      0
1  {'text': 'Does this mean that there's now a bi...  Does this mean that there's now a big-name com...      1
2  {'text': 'Also on BBC News:  http://news.bbc.c...  Also on BBC News:  http://news.bbc.co.uk/1/low...      2
3  {'text': 'I don't understand what they do that...  I don't understand what they do that is worth ...      3
4  {'text': 'sold out too cheaply. given their le...  sold out too cheaply. given their leadership p...      4

Or use DataFrame constructor:

>>> pd.concat([df, pd.DataFrame(df['comments'].tolist())], axis=1)
                                            comments                                               text  score
0  {'text': 'It will be curious to see where this...  It will be curious to see where this heads in ...      0
1  {'text': 'Does this mean that there's now a bi...  Does this mean that there's now a big-name com...      1
2  {'text': 'Also on BBC News:  http://news.bbc.c...  Also on BBC News:  http://news.bbc.co.uk/1/low...      2
3  {'text': 'I don't understand what they do that...  I don't understand what they do that is worth ...      3
4  {'text': 'sold out too cheaply. given their le...  sold out too cheaply. given their leadership p...      4

If you want to remove comments column, just replace df['comments'] by df.pop('comments') like:

>>> pd.concat([df, pd.json_normalize(df.pop('comments'))], axis=1)

Use list comperhension:

import pandas as pd

data = {
    'id': [1, 2, 3],
    'comments': [
        [{'text': 'comment 1', 'score': 1}, {'text': 'comment 2', 'score': 2}],
        [{'text': 'comment 3', 'score': 3}],
        [{'text': 'comment 4', 'score': 4}, {'text': 'comment 5', 'score': 5}, {'text': 'comment 6', 'score': 6}]
    ]
}
df = pd.DataFrame(data)

comments_df = pd.concat([pd.DataFrame(c) for c in df['comments']], ignore_index=True)

result_df = pd.merge(df.drop(columns=['comments']), comments_df, left_index=True, right_index=True)

print(result_df)

Output:

  id       text  score
0   1  comment 1      1
1   2  comment 2      2
2   3  comment 3      3

Please signup or login to give your own answer.

Click here to cancel reply.