I have some Python code here that retrieves a max limit of 200 Tweets from each of the USA Democratic political candidates’ Twitter accounts. Although, I have it set to no replies and no Retweets, so it’s actually returning much less. I know that you can return 200 Tweets max per call though you can make multiple calls, specifically 180, in a 15-minute window which would return many more Tweets. My question is how to go about making multiple calls while still returning the data in the pandas df
format that I have set up currently. Thanks!
import datetime as dt
import os
import pandas as pd
import tweepy as tw
#define developer's permissions
consumer_key = 'xxxxxxxx'
consumer_secret = 'xxxxxxxx'
access_token = 'xxxxxx'
access_token_secret = 'xxxxxxx'
#access twitter's API
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)
#function collects tweets from
def get_tweets(handle):
try:
tweets = api.user_timeline(screen_name=handle,
count=200,
exclude_replies=True,
include_rts=False,
tweet_mode="extended")
print(handle, "Number of tweets extracted: {}n".format(len(tweets)))
df = pd.DataFrame(data=[tweet.user.screen_name for tweet in tweets], columns=['handle'])
df['tweets'] = np.array([tweet.full_text for tweet in tweets])
df['date'] = np.array([tweet.created_at for tweet in tweets])
df['len'] = np.array([len(tweet.full_text) for tweet in tweets])
df['like_count'] = np.array([tweet.favorite_count for tweet in tweets])
df['rt_count'] = np.array([tweet.retweet_count for tweet in tweets])
except:
pass
return df
#list of all the candidate twitter handles
handles = ['@JoeBiden', '@ewarren', '@BernieSanders', '@MikeBloomberg', '@PeteButtigieg', '@AndrewYang', '@AmyKlobuchar']
df = pd.DataFrame()
#loop through the diffent candidate twitter handles and collect each candidates tweets
for handle in handles:
df_new = get_tweets(handle)
df = pd.concat((df, df_new))
@JoeBiden Number of tweets extracted: 200.
@ewarren Number of tweets extracted: 200.
@BernieSanders Number of tweets extracted: 200.
@MikeBloomberg Number of tweets extracted: 200.
@PeteButtigieg Number of tweets extracted: 200.
@AndrewYang Number of tweets extracted: 200.
@AmyKlobuchar Number of tweets extracted: 200.
2
Answers
First of all, you’re going to want to regenerate your credentials now.
You can iterate through paginated results with a
Cursor
or by passing thesince_id
and/ormax_id
parameters forAPI.user_timeline
.See also the documentation for the GET statuses/user_timeline endpoint.
The Twitter API documentation explain why you get a lower result:
exclude_replies – “This parameter will prevent replies from appearing in the returned timeline. Using exclude_replies with the count parameter will mean you will receive up-to count tweets — this is because the count parameter retrieves that many Tweets before filtering out retweets and replies.”