Text mining - Mining tweet with multiple terms at the same time using Stream in Python - Twitter API

MythiliViswanathan
December 13, 2017
222 views
2 votes
3 Answers

I am working on text mining in Python using twitter data to study sentiments on IPO(Initial Public Offering) for Indian Companies. I need help to extract tweets that have multiple terms in them – all inclusive. For example I want tweets in which all three words "Mahindra", "Logistics" and "IPO" is there. Is there a way to do this using the stream function in Python?

I have attached my code also

    if __name__ == '__main__':

#This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)


stream = Stream(auth, l)
#This line filter Twitter Streams to capture data by the keywords: 'Mahindra' 'Logistics' 'IPO'
stream.filter(track=['Mahindra,Logistics,IPO'])

Answers

- SaquibSheikh
- December 13, 2017 at 8:28 am
- 0 votes
0
I was not able to comment on your question so I had to post an answer.

I haven’t looked into Twitter APIs I do have an alternative though. You can use Twitter Scraper and achieve the same without having to do a lot of coding.

Login or Signup to reply.

Your code seems to be only a (incomplete) python fragment, but it still looks familiar to me.
I use the following script to fetch data from the Twitter Stream API:

# To run this code, first edit config.py with your configuration (Auth data), then install necessary modules, then:
#
# Call
#
# mkdir data
# python twitter_stream_download.py -q apple -d data
#
#
# It will produce the list of tweets for the query "apple"
# in the file data/stream_apple.json

# analyse tweets with jq:
# cat stream_apple.json | jq -s '.[] |  {user: .user.name}

import tweepy
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import time
import argparse
import string
import config
import json

def get_parser():
    """Get parser for command line arguments."""
    parser = argparse.ArgumentParser(description="Twitter Downloader")
    parser.add_argument("-q",
                        "--query",
                        dest="query",
                        help="Query/Filter",
                        default='-')
    parser.add_argument("-l",
                        "--lang",
                        dest="languages",
                        help="Languages",
                        default='en')

    parser.add_argument("-d",
                        "--data-dir",
                        dest="data_dir",
                        help="Output/Data Directory")
    return parser


class MyListener(StreamListener):
    """Custom StreamListener for streaming data."""

    def __init__(self, data_dir=".", query=""):
        query_fname = format_filename(query)
        self.outfile = "%s/stream_%s.json" % (data_dir, query_fname)
        print("Writing to '{}'").format(self.outfile)


    def on_data(self, data):
        try:
            with open(self.outfile, 'a') as f:
                f.write(data)
                print(data)
                return True
        except BaseException as e:
            print("Error on_data: %s" % str(e))
            time.sleep(5)
        return True

    def on_error(self, status):
        if status_code == 420:
            #returning False in on_data disconnects the stream
            print("rate limited - to many connection attempts. Please wait.")
            return False
        else:
            print(status)
        return True


def format_filename(fname):
    """Convert file name into a safe string.

    Arguments:
        fname -- the file name to convert
    Return:
        String -- converted file name
    """
    return ''.join(convert_valid(one_char) for one_char in fname)


def convert_valid(one_char):
    """Convert a character into '_' if invalid.

    Arguments:
        one_char -- the char to convert
    Return:
        Character -- converted char
    """
    valid_chars = "-_.%s%s" % (string.ascii_letters, string.digits)
    if one_char in valid_chars:
        return one_char
    else:
        return '_'

@classmethod
def parse(cls, api, raw):
    status = cls.first_parse(api, raw)
    setattr(status, 'json', json.dumps(raw))
    return status

if __name__ == '__main__':
    parser = get_parser()
    args = parser.parse_args()
    auth = OAuthHandler(config.consumer_key, config.consumer_secret)
    auth.set_access_token(config.access_token, config.access_secret)
    api = tweepy.API(auth)

    twitter_stream = Stream(auth, MyListener(args.data_dir, args.query))
    twitter_stream.filter(track=[args.query], languages=[args.languages], async=False)

Create an output dir first, and a file config.py

consumer_key = "7r..."
consumer_secret = "gp..."
access_token = "5Q..."
access_secret = "a3..."

Then call it like this:

python twitter_stream_download.py --query #Logistics" -d data

- MaxSmith
- February 6, 2018 at 7:49 am
- 0 votes
0
I had this exact issue (and I needed to look for tweets that were more than a week old). And since existing packages were too slow I decided to create a small package called Twper. I think you might find it interesting. There’s an example in the Readme that solves ur exact issue.

Disclaimer: I am the author of this package and it is relatively new but hopefully it’ll help.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Text mining – Mining tweet with multiple terms at the same time using Stream in Python – Twitter API

Answers