skip to Main Content

Hi I am having an issue with searching for a specific piece of text within a tweet. I am currently using tweepy to stream tweets based on an array of keywords (called filterKeywords), however i want a specific function to be done depending on what keyword the tweet was filtered by.

I load the tweet into a JSON variable and try to use a for loop to cycle through the filterKeywords array in my on_data method, performing an IF statement to search if the current element on the filterKeywords array matches any text within the ‘text’ tag of the JSON tweet, however it doesnt seem to be filtering anything and seems to go to the else statement in my if statement immediately. Here is my code below. Any help would be much appreciated. Thanks

import tweepy
import pymongo
import json

consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)

filterKeywords = ['IBM', 'Microsoft', 'Facebook', 'Twitter', 'Apple',        'Google', 'Amazon', 'EBay', 'Diageo',
              'General Motors', 'General Electric', 'Telefonica', 'Rolls Royce', 'Walmart', 'HSBC', 'BP',
              'Investec', 'WWE', 'Time Warner', 'Santander Group']


class CustomStreamListener(tweepy.StreamListener):
def __init__(self, api):
    self.api = api
    super(tweepy.StreamListener, self).__init__()
    try:
        global conn
        conn = pymongo.MongoClient('localhost', 27017)
        print "Connected successfully!!!"
        global db
        db = conn.mydb
    except pymongo.errors.ConnectionFailure, e:
        print "Could not connect to MongoDB: %s" % e
        conn


def on_data(self, data):
    datajson = json.loads(data)
    for word in filterKeywords:
       if word in datajson['text']:
        collection = db[word]
        collection.insert(datajson)
        print('Tweet found filtered by ' + word)
    else:
        print('')



def on_error(self, status_code):
    return True  # Don't kill the stream

def on_timeout(self):
    return True  # Don't kill the stream


sapi = tweepy.streaming.Stream(auth, CustomStreamListener(api))

sapi.filter(track=filterKeywords)

2

Answers


  1. I think your problem is that you included “Twitter” in the filter keywords, and that matches almost everything (not only the text are used for filtering, but some other fields as well). Try removing it from the filter keywords.

    Login or Signup to reply.
  2. def on_data(self, data):
        datajson = json.loads(data)
        if any([i for i in filterKeywords if i in datajson["text"]]):
            """Do Desired function"""
        else:
            print('if statement not working')
    

    Simple mistake on your program, even after if condition works it may enter else in the next iteration.

    From your comments If you wish to avoid keyError 'test'.Rewrite your function like

    def on_data(self, data):
    datajson = json.loads(data)
    for word in filterKeywords:
        if datajson.get('text') and word in datajson['text']:
            collection = db[word]
            collection.insert(datajson)
            print('Tweet found filtered by ' + word)
    else:
        print('')
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search