skip to Main Content

I’m trying to pull data from Twitter over a month or so for a project. There are <10000 tweets over this time period with this hashtag, but I’m only seeming to get all the tweets from the current day. I got 68 yesterday, and 80 today; both were timestamped with the current day.

api = tweepy.API(auth)
igsjc_tweets = api.search(q="#igsjc", since='2014-12-31', count=100000)

ipdb> len(igsjc_tweets)
80

I know for certain there should be more than 80 tweets. I’ve heard that Twitter rate-limits to 1500 tweets at a time, but does it also rate-limit to a certain day? Note that I’ve also tried the Cursor approach with

igsjc_tweets = tweepy.Cursor(api.search, q="#igsjc", since='2015-12-31', count=10000)

This also only gets me 80 tweets. Any tips or suggestions on how to get the full data would be appreciated.

2

Answers


  1. Here’s the official tweepy tutorial on Cursor. Note: you need to iterate through the Cursor, shown below. Also, there is a max count that you can pass .items(), so it’s probably a good idea to pull month-by-month or something similar and probably a good idea to sleep in between calls. HTH!

    igsjc_tweets_jan = [tweet for tweet in tweepy.Cursor(
                        api.search, q="#igsjc", since='2016-01-01', until='2016-01-31').items(1000)] 
    
    Login or Signup to reply.
  2. First, tweepy cannot bring too old data using its search API
    I don’t know the exact limitation but maybe month or two back only.

    anyway,
    you can use this piece of code to get tweets.
    i run it in order to get tweets from last few days and it works for me.

    notice that you can refine it and add geocode information – i left an example commented out for you

    flag = True
    last_id = None
    while (flag):
       flag = False
       for status in tweepy.Cursor(api.search,
                              #q='geocode:"37.781157,-122.398720,1mi" since:'+since+' until:'+until+' include:retweets',
    
                              q="#igsjc",
                              since='2015-12-31',
    
                              max_id=last_id,
                              result_type='recent',
                              include_entities=True,
                              monitor_rate_limit=False, 
                              wait_on_rate_limit=False).items(300):
           tweet = status._json
           print(Tweet)
    
           flag = True # there still some more data to collect
           last_id = status.id # for next time
    

    Good luck

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search