skip to Main Content

I’m currently scraping some user//follower information from the Twitter API using Tweepy. I’m currently storing the data as a a dictionary where every key is a unique twitter user and the values are a list of ID’s for their followers.

The data looks like this:

{'realDonaldTrump': [
    123456,
    123457,
    123458,
    ...
    ],
 'BarackObama' : [
    999990,
    999991,
    999992,
    ...
    ]}

What I need is a dataframe that looks like this:

user             follower
realDonaldTrump  123456
realDonaldTrump  123457
realDonaldTrump  123458
...              ...
BarackObama      999990
BarackObama      999991
BarackObama      999992
...              ...

I’ve already tried:

df = pd.DataFrame.from_dict(followers)

but it gives me a new column for each key, and doesn’t handle uneven length of follower lists.

Is there a smart way to convert the dictionary structure I have into a dataframe? Or should I store the initial data differently?

3

Answers


  1. Use list comprehension for tuples and pass to DataFrame constructor:

    followers = {'realDonaldTrump': [
        123456,
        123457
        ],
     'BarackObama' : [
        999990,
        999991,
        999992
        ]}
    
    df = pd.DataFrame([(k, x) for k, v in followers.items() for x in v], 
                       columns=['user','follower'])
    print (df)
                  user  follower
    0  realDonaldTrump    123456
    1  realDonaldTrump    123457
    2      BarackObama    999990
    3      BarackObama    999991
    4      BarackObama    999992
    
    Login or Signup to reply.
  2. Create a compatible dict:

    final_dict = {'users':[], 'followers':[]}
    for key in followers:
      for i in range(len(followers[key])):
        final_dict['users'].append(key)
        final_dict['followers'].append(followers[key][i])
    
    df = pd.DataFrame.from_dict(final_dict)
    

    Output:

        users           followers
    0   realDonaldTrump 123456
    1   realDonaldTrump 123457
    2   realDonaldTrump 123458
    3   BarackObama     999990
    4   BarackObama     999991
    5   BarackObama     999992
    
    Login or Signup to reply.
  3. import pandas as pd
    
    followers = {
        'realDonaldTrump': [123456, 123457, 123458],
        'BarackObama': [999990, 999991, 999992]
    }
    
    df = pd.DataFrame()
    
    i = 0
    for user in followers:
        for r in followers[user]:
            df.loc[i, 'user'] = user
            df.loc[i, 'record'] = r
            i = i + 1
    
    print(df)
    

    Result:

                 user    record
    0  realDonaldTrump  123456
    1  realDonaldTrump  123457
    2  realDonaldTrump  123458
    3      BarackObama  999990
    4      BarackObama  999991
    5      BarackObama  999992
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search