How to make multiple calls to Twitter API to get more than 200 Tweets per user using Tweepy?

MitchellLaferla
February 9, 2020
236 views
0 votes
2 Answers

I have some Python code here that retrieves a max limit of 200 Tweets from each of the USA Democratic political candidates’ Twitter accounts. Although, I have it set to no replies and no Retweets, so it’s actually returning much less. I know that you can return 200 Tweets max per call though you can make multiple calls, specifically 180, in a 15-minute window which would return many more Tweets. My question is how to go about making multiple calls while still returning the data in the pandas df format that I have set up currently. Thanks!

import datetime as dt
import os
import pandas as pd
import tweepy as tw

#define developer's permissions
consumer_key = 'xxxxxxxx'
consumer_secret = 'xxxxxxxx'
access_token = 'xxxxxx'
access_token_secret = 'xxxxxxx'

#access twitter's API
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)
#function collects tweets from 
def get_tweets(handle):
    try:
        tweets = api.user_timeline(screen_name=handle, 
                                   count=200,
                                   exclude_replies=True, 
                                   include_rts=False,
                                  tweet_mode="extended")
        print(handle, "Number of tweets extracted: {}n".format(len(tweets)))
        df = pd.DataFrame(data=[tweet.user.screen_name for tweet in tweets], columns=['handle'])
        df['tweets'] = np.array([tweet.full_text for tweet in tweets])
        df['date'] = np.array([tweet.created_at for tweet in tweets])
        df['len'] = np.array([len(tweet.full_text) for tweet in tweets])
        df['like_count'] = np.array([tweet.favorite_count for tweet in tweets])
        df['rt_count'] = np.array([tweet.retweet_count for tweet in tweets])
    except:
        pass
    return df

#list of all the candidate twitter handles
handles = ['@JoeBiden', '@ewarren', '@BernieSanders', '@MikeBloomberg', '@PeteButtigieg', '@AndrewYang', '@AmyKlobuchar']
df = pd.DataFrame()

#loop through the diffent candidate twitter handles and collect each candidates tweets
for handle in handles:
    df_new = get_tweets(handle)
    df = pd.concat((df, df_new))

@JoeBiden Number of tweets extracted: 200.

@ewarren Number of tweets extracted: 200.

@BernieSanders Number of tweets extracted: 200.

@MikeBloomberg Number of tweets extracted: 200.

@PeteButtigieg Number of tweets extracted: 200.

@AndrewYang Number of tweets extracted: 200.

@AmyKlobuchar Number of tweets extracted: 200.

Answers

- Harmon758
- February 9, 2020 at 8:46 am
- 0 votes
0
First of all, you’re going to want to regenerate your credentials now.

You can iterate through paginated results with a Cursor or by passing the since_id and/or max_id parameters for API.user_timeline.

See also the documentation for the GET statuses/user_timeline endpoint.

Login or Signup to reply.

- Punnerud
- February 10, 2020 at 12:32 pm
- 0 votes
0
The Twitter API documentation explain why you get a lower result:

exclude_replies – “This parameter will prevent replies from appearing in the returned timeline. Using exclude_replies with the count parameter will mean you will receive up-to count tweets — this is because the count parameter retrieves that many Tweets before filtering out retweets and replies.”

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.