skip to Main Content

in the data, column df[‘comments’] consists of 88107 rows with arrays with dictionaries

`
[{'text': "It will be curious to see where this heads in the long run.  CBS is on a tear but will it fit their image, will they try and establish control, overall agenda.  I've enjoyed last.fm for many years supporting through paypal donations each time I expire...it'll be interesting.",
  'score': 0},
 {'text': "Does this mean that there's now a big-name company who will fight for the repeal of the recent streaming-music royalty hike?",
  'score': 1},
 {'text': 'Also on BBC News:  http://news.bbc.co.uk/1/low/technology/6701863.stm .Nice to see a London-based co. hit the headlines.',
  'score': 2},
 {'text': "I don't understand what they do that is worth $70M a year. ",
  'score': 3},
 {'text': 'sold out too cheaply. given their leadership position, they should have ask for at least $500m',
  'score': 4}]
`

how to make one big table with columns ‘text’ and ‘score’

`l=pd.DataFrame()
for i in range(len(df['comments'][0])+1):
    n1=pd.DataFrame(data=df['comments'][i])
    l=pd.concat([l, n1], axis=0)`

I was able to make a table from one array, but I can’t open all 88000
enter image description here

3

Answers


  1. If for each value of comments is list of dictionaries, need flatten them with DataFrame constructor:

    L = [{'text': "It will be curious to see where this heads in the long run.  CBS is on a tear but will it fit their image, will they try and establish control, overall agenda.  I've enjoyed last.fm for many years supporting through paypal donations each time I expire...it'll be interesting.",
      'score': 0},
     {'text': "Does this mean that there's now a big-name company who will fight for the repeal of the recent streaming-music royalty hike?",
      'score': 1},
     {'text': 'Also on BBC News:  http://news.bbc.co.uk/1/low/technology/6701863.stm .Nice to see a London-based co. hit the headlines.',
      'score': 2},
     {'text': "I don't understand what they do that is worth $70M a year. ",
      'score': 3},
     {'text': 'sold out too cheaply. given their leadership position, they should have ask for at least $500m',
      'score': 4}]
    
    df = pd.DataFrame({'comments':[ L, L, L]})
    

    df = pd.DataFrame([y for x in df['comments'] for y in x])
    print (df)
                                                     text  score
    0   It will be curious to see where this heads in ...      0
    1   Does this mean that there's now a big-name com...      1
    2   Also on BBC News:  http://news.bbc.co.uk/1/low...      2
    3   I don't understand what they do that is worth ...      3
    4   sold out too cheaply. given their leadership p...      4
    5   It will be curious to see where this heads in ...      0
    6   Does this mean that there's now a big-name com...      1
    7   Also on BBC News:  http://news.bbc.co.uk/1/low...      2
    8   I don't understand what they do that is worth ...      3
    9   sold out too cheaply. given their leadership p...      4
    10  It will be curious to see where this heads in ...      0
    11  Does this mean that there's now a big-name com...      1
    12  Also on BBC News:  http://news.bbc.co.uk/1/low...      2
    13  I don't understand what they do that is worth ...      3
    14  sold out too cheaply. given their leadership p...      4
    
    Login or Signup to reply.
  2. You can use pd.json_normalize:

    >>> pd.concat([df, pd.json_normalize(df['comments'])], axis=1)
    
                                                comments                                               text  score
    0  {'text': 'It will be curious to see where this...  It will be curious to see where this heads in ...      0
    1  {'text': 'Does this mean that there's now a bi...  Does this mean that there's now a big-name com...      1
    2  {'text': 'Also on BBC News:  http://news.bbc.c...  Also on BBC News:  http://news.bbc.co.uk/1/low...      2
    3  {'text': 'I don't understand what they do that...  I don't understand what they do that is worth ...      3
    4  {'text': 'sold out too cheaply. given their le...  sold out too cheaply. given their leadership p...      4
    

    Or use DataFrame constructor:

    >>> pd.concat([df, pd.DataFrame(df['comments'].tolist())], axis=1)
                                                comments                                               text  score
    0  {'text': 'It will be curious to see where this...  It will be curious to see where this heads in ...      0
    1  {'text': 'Does this mean that there's now a bi...  Does this mean that there's now a big-name com...      1
    2  {'text': 'Also on BBC News:  http://news.bbc.c...  Also on BBC News:  http://news.bbc.co.uk/1/low...      2
    3  {'text': 'I don't understand what they do that...  I don't understand what they do that is worth ...      3
    4  {'text': 'sold out too cheaply. given their le...  sold out too cheaply. given their leadership p...      4
    

    If you want to remove comments column, just replace df['comments'] by df.pop('comments') like:

    >>> pd.concat([df, pd.json_normalize(df.pop('comments'))], axis=1)
    
    Login or Signup to reply.
  3. Use list comperhension:

    import pandas as pd
    
    data = {
        'id': [1, 2, 3],
        'comments': [
            [{'text': 'comment 1', 'score': 1}, {'text': 'comment 2', 'score': 2}],
            [{'text': 'comment 3', 'score': 3}],
            [{'text': 'comment 4', 'score': 4}, {'text': 'comment 5', 'score': 5}, {'text': 'comment 6', 'score': 6}]
        ]
    }
    df = pd.DataFrame(data)
    
    comments_df = pd.concat([pd.DataFrame(c) for c in df['comments']], ignore_index=True)
    
    result_df = pd.merge(df.drop(columns=['comments']), comments_df, left_index=True, right_index=True)
    
    print(result_df)
    

    Output:

      id       text  score
    0   1  comment 1      1
    1   2  comment 2      2
    2   3  comment 3      3
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search