How can filter for specific keywords in extracted tweets? - Twitter API

Jason_Erick23o
January 9, 2022
265 views
0 votes
2 Answers

I have a code that gives me the tweets from my timeline on Twitter and saves them to a CSV. How can I make it search and save only tweets that contain a specific keyword X?

The code is below:

access_token = config['twitter']['access_token']
access_token_secret = config['twitter']['access_token_secret']

auth = tweepy.OAuthHandler(api_key, api_key_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

public_tweets = api.home_timeline()
data = []

for tweet in public_tweets:
    data.append([tweet.created_at, tweet.user.screen_name, tweet.text])

Answers

- Anil
- January 9, 2022 at 11:58 pm
- 0 votes
0
Python provides the in operator for words in strings, so you don’t have to use regex or something more involved than a simple if, as per the following:
```
query_string = "word" # your keyword

for tweet in public_tweets:
    if query_string in tweet.text:
        data.append([tweet.created_at, tweet.user.screen_name, tweet.text])
```
Login or Signup to reply.

The simplest approach would be to check if keyword in tweet.text, but you’ll get false positives (e.g. baseball will match if keyword='ball'). The better approach can use regex:

import tweepy
import configparser
import pandas as pd
import re

config = configparser.ConfigParser()
config.read('config.ini')
api_key = config['twitter']['api_key']
api_key_secret = config['twitter']['api_key_secret']
access_token = config['twitter']['access_token']
access_token_secret = config['twitter']['access_token_secret']

auth = tweepy.OAuthHandler(api_key, api_key_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth)

public_tweets = api.home_timeline()

columns = ['Time', 'User', 'Tweet']

keywords = ['foo', 'bar']
regex = re.compile(r'b(' + '|'.join(keywords) + r')b')
data = [[tweet.created_at, tweet.user.screen_name, tweet.text]
        for tweet in public_tweets
        if regex.search(tweet.text)]
    
df = pd.DataFrame(data, columns=columns)
df.to_csv('Tweets.csv')

Here b refers to word boundary and | separates words in group. So we search for any of keywords if they’re not part of some larger word. re.compile is used only to speed things up and not to recompile it for every iteration. List comprehension is just more readable IMO comparing to .append() in a loop (and also faster).

Please signup or login to give your own answer.

Click here to cancel reply.

How can filter for specific keywords in extracted tweets? – Twitter API

Answers