I have a working REST Search API script that pulls tweets according to https://www.karambelkar.info/2015/01/how-to-use-twitters-search-rest-api-most-effectively./
Problem: This code works, but pulls tweets with searchQuery1
and searchQuery2
. (e.g. tweets with Prostate Cancer
+ Colon Cancer
). I don’t want this. Instead, I would like to get all of tweets from searchQuery1
(only tweets with Prostate Cancer
), and then all of the tweets from searchQuery2
, (only tweets with Colon Cancer
). The queries should run separately.
Goal: Sequentially loop over X number of search queries (e.g. searchQuery1
, searchQuery2
, etc)
Thank you!
searchQuery1 = 'Prostate Cancer'
searchQuery2 = 'Colon Cancer'
maxTweets = 10000
tweetsPerQry = 100
fprefix = 'REST'
sinceId = None
max_id = -1L
tweetCount = 0
with open('/Users/eer/Desktop/' + fprefix + '.' + time.strftime('%Y-%m-%d_%H-%M-%S') + '.json', 'a+') as f: #open file
while tweetCount < maxTweets:
try:
if (max_id <= 0):
if (not sinceId):
for x,y in zip(searchQuery1,searchQuery2):
new_tweets = api.search(q=[searchQuery1, searchQuery2], count=tweetsPerQry)
else:
print "sinceID 1"
new_tweets = api.search(q=[searchQuery1, searchQuery2], count=tweetsPerQry,
since_id=sinceId)
else:
if (not sinceId):
print "not sinceID 2"
new_tweets = api.search(q=[searchQuery1, searchQuery2], count=tweetsPerQry,
max_id=str(max_id - 1))
else:
print "sinceID 1"
new_tweets = api.search(q=[searchQuery1, searchQuery2], count=tweetsPerQry,
max_id=str(max_id - 1),
since_id=sinceId)
if not new_tweets:
print("No more tweets found")
break
for tweet in new_tweets:
f.write(jsonpickle.encode(tweet._json, unpicklable=False) +
'n')
tweetCount += len(new_tweets)
max_id = new_tweets[-1].id
except tweepy.TweepError as e:
print("some error : " + str(e))
break
print ("Downloaded {0} tweets, Saved to {1}".format(tweetCount, fprefix))
2
Answers
I would change your query to
'"Prostate Cancer" OR "Colon Cancer"'
and store the results. Then order them how you want later. It sounds like you want the following in pseudo-code: