skip to Main Content

I have a code that gives me the tweets from my timeline on Twitter and saves them to a CSV. How can I make it search and save only tweets that contain a specific keyword X?

The code is below:

access_token = config['twitter']['access_token']
access_token_secret = config['twitter']['access_token_secret']

auth = tweepy.OAuthHandler(api_key, api_key_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

public_tweets = api.home_timeline()
data = []

for tweet in public_tweets:
    data.append([tweet.created_at, tweet.user.screen_name, tweet.text])

2

Answers


  1. Python provides the in operator for words in strings, so you don’t have to use regex or something more involved than a simple if, as per the following:

    query_string = "word" # your keyword
    
    for tweet in public_tweets:
        if query_string in tweet.text:
            data.append([tweet.created_at, tweet.user.screen_name, tweet.text])
    
    Login or Signup to reply.
  2. The simplest approach would be to check if keyword in tweet.text, but you’ll get false positives (e.g. baseball will match if keyword='ball'). The better approach can use regex:

    import tweepy
    import configparser
    import pandas as pd
    import re
    
    config = configparser.ConfigParser()
    config.read('config.ini')
    api_key = config['twitter']['api_key']
    api_key_secret = config['twitter']['api_key_secret']
    access_token = config['twitter']['access_token']
    access_token_secret = config['twitter']['access_token_secret']
    
    auth = tweepy.OAuthHandler(api_key, api_key_secret)
    auth.set_access_token(access_token, access_token_secret)
    
    api = tweepy.API(auth)
    
    public_tweets = api.home_timeline()
    
    columns = ['Time', 'User', 'Tweet']
    
    keywords = ['foo', 'bar']
    regex = re.compile(r'b(' + '|'.join(keywords) + r')b')
    data = [[tweet.created_at, tweet.user.screen_name, tweet.text]
            for tweet in public_tweets
            if regex.search(tweet.text)]
        
    df = pd.DataFrame(data, columns=columns)
    df.to_csv('Tweets.csv')
    

    Here b refers to word boundary and | separates words in group. So we search for any of keywords if they’re not part of some larger word. re.compile is used only to speed things up and not to recompile it for every iteration. List comprehension is just more readable IMO comparing to .append() in a loop (and also faster).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search