Im trying to collect a large number of tweets from the twitter api using tweepy. I have a text file containing around ten thousand tweet IDs. My program reads through the file, grabbing each tweet, as well as the tweet that it is replying to. It then saves the text of each tweet, as well as the usernames of each author, in respective text files. Here’s the code:
auth = tweepy.OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
api = tweepy.API(auth)
tweetsFile = open("srcstic.txt", "r")
tweets_seen = set() # holds tweets already seen
def getNextLine():
while True:
tweetID = tweetsFile.readline()
getTweetObj(tweetID)
if not tweetID:
break
def getTweetObj(tweetID):
try:
tweetObj = api.get_status(tweetID)
#sleep(11)
except tweepy.error.TweepError:
getNextLine()
else:
pass
tweet = tweetObj.text.encode("utf8")
idnum = tweetObj.in_reply_to_status_id_str
try:
former = api.get_status(idnum)
except tweepy.error.TweepError:
getNextLine()
else:
printFiles(former, tweetObj, tweet)
def printFiles(former, tweetObj, tweet):
callUserName = former.user.screen_name
responseUserName = tweetObj.user.screen_name
if tweet not in tweets_seen:
with open("callauthors.txt", "a") as callauthors:
cauthors = callUserName + "n"
callauthors.write(cauthors)
with open("responseauthors.txt", "a") as responseauthors:
rauthors = responseUserName + "n"
responseauthors.write(rauthors)
with open("response_tweets.txt", "a") as responsetweets:
output = (tweetObj.text.encode('utf-8')+"n")
responsetweets.write(output)
with open("call_tweets.txt", "a") as calltweets:
output = (former.text.encode('utf-8')+"n")
calltweets.write(output)
tweets_seen.add(tweet)
getNextLine()
However, all works fine for a while then i get the following errors:
File "gettweets2.py", line 68, in <module>
getNextLine()
File "gettweets2.py", line 21, in getNextLine
getTweetObj(tweetID)
File "gettweets2.py", line 40, in getTweetObj
getNextLine()
File "gettweets2.py", line 21, in getNextLine
getTweetObj(tweetID)
File "gettweets2.py", line 31, in getTweetObj
getNextLine()
File "gettweets2.py", line 21, in getNextLine
getTweetObj(tweetID)
File "gettweets2.py", line 31, in getTweetObj
getNextLine()
File "gettweets2.py", line 21, in getNextLine
getTweetObj(tweetID)
File "gettweets2.py", line 31, in getTweetObj
getNextLine()
........
........
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/_ abcoll.py", line 540, in update
if isinstance(other, Mapping):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/abc.py", line 141, in __instancecheck__
subtype in cls._abc_negative_cache):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/_weakrefset.py", line 73, in __contains__
return wr in self.data
RuntimeError: maximum recursion depth exceeded in cmp
any ideas whats going wrong here?
Thanks.
2
Answers
You can only call a function recursively 999 times after that you get that error. You could instead call from outside the function using a conditional statement or create a generator.
As best I can read this, a read error can throw you into infinite recursion, as each routine calls the other. If fetching the next line doesn’t break you out of the error condition, you’ll exceed the stack limit in less than a second.
If nothing else, do a quick check with a couple of print statements: print out the tweetIDs as you encounter them, labeled to identify the print location. A direct repair may well include writing a second routine that grabs the original tweet, without having the capability to recur. This assumes that you need only the immediate parent of the current tweet, rather than the entire chain.