Hi I am having an issue with searching for a specific piece of text within a tweet. I am currently using tweepy to stream tweets based on an array of keywords (called filterKeywords), however i want a specific function to be done depending on what keyword the tweet was filtered by.
I load the tweet into a JSON variable and try to use a for loop to cycle through the filterKeywords array in my on_data method, performing an IF statement to search if the current element on the filterKeywords array matches any text within the ‘text’ tag of the JSON tweet, however it doesnt seem to be filtering anything and seems to go to the else statement in my if statement immediately. Here is my code below. Any help would be much appreciated. Thanks
import tweepy
import pymongo
import json
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
filterKeywords = ['IBM', 'Microsoft', 'Facebook', 'Twitter', 'Apple', 'Google', 'Amazon', 'EBay', 'Diageo',
'General Motors', 'General Electric', 'Telefonica', 'Rolls Royce', 'Walmart', 'HSBC', 'BP',
'Investec', 'WWE', 'Time Warner', 'Santander Group']
class CustomStreamListener(tweepy.StreamListener):
def __init__(self, api):
self.api = api
super(tweepy.StreamListener, self).__init__()
try:
global conn
conn = pymongo.MongoClient('localhost', 27017)
print "Connected successfully!!!"
global db
db = conn.mydb
except pymongo.errors.ConnectionFailure, e:
print "Could not connect to MongoDB: %s" % e
conn
def on_data(self, data):
datajson = json.loads(data)
for word in filterKeywords:
if word in datajson['text']:
collection = db[word]
collection.insert(datajson)
print('Tweet found filtered by ' + word)
else:
print('')
def on_error(self, status_code):
return True # Don't kill the stream
def on_timeout(self):
return True # Don't kill the stream
sapi = tweepy.streaming.Stream(auth, CustomStreamListener(api))
sapi.filter(track=filterKeywords)
2
Answers
I think your problem is that you included “Twitter” in the filter keywords, and that matches almost everything (not only the text are used for filtering, but some other fields as well). Try removing it from the filter keywords.
Simple mistake on your program, even after
if
condition works it may enterelse
in the next iteration.From your comments If you wish to avoid
keyError 'test'
.Rewrite your function like