skip to Main Content

I am working on text mining in Python using twitter data to study sentiments on IPO(Initial Public Offering) for Indian Companies. I need help to extract tweets that have multiple terms in them – all inclusive. For example I want tweets in which all three words "Mahindra", "Logistics" and "IPO" is there. Is there a way to do this using the stream function in Python?

I have attached my code also

    if __name__ == '__main__':

#This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)


stream = Stream(auth, l)
#This line filter Twitter Streams to capture data by the keywords: 'Mahindra' 'Logistics' 'IPO'
stream.filter(track=['Mahindra,Logistics,IPO'])               

3

Answers


  1. I was not able to comment on your question so I had to post an answer.

    I haven’t looked into Twitter APIs I do have an alternative though. You can use Twitter Scraper and achieve the same without having to do a lot of coding.

    Login or Signup to reply.
  2. Your code seems to be only a (incomplete) python fragment, but it still looks familiar to me.
    I use the following script to fetch data from the Twitter Stream API:

    # To run this code, first edit config.py with your configuration (Auth data), then install necessary modules, then:
    #
    # Call
    #
    # mkdir data
    # python twitter_stream_download.py -q apple -d data
    #
    #
    # It will produce the list of tweets for the query "apple"
    # in the file data/stream_apple.json
    
    # analyse tweets with jq:
    # cat stream_apple.json | jq -s '.[] |  {user: .user.name}
    
    import tweepy
    from tweepy import Stream
    from tweepy import OAuthHandler
    from tweepy.streaming import StreamListener
    import time
    import argparse
    import string
    import config
    import json
    
    def get_parser():
        """Get parser for command line arguments."""
        parser = argparse.ArgumentParser(description="Twitter Downloader")
        parser.add_argument("-q",
                            "--query",
                            dest="query",
                            help="Query/Filter",
                            default='-')
        parser.add_argument("-l",
                            "--lang",
                            dest="languages",
                            help="Languages",
                            default='en')
    
        parser.add_argument("-d",
                            "--data-dir",
                            dest="data_dir",
                            help="Output/Data Directory")
        return parser
    
    
    class MyListener(StreamListener):
        """Custom StreamListener for streaming data."""
    
        def __init__(self, data_dir=".", query=""):
            query_fname = format_filename(query)
            self.outfile = "%s/stream_%s.json" % (data_dir, query_fname)
            print("Writing to '{}'").format(self.outfile)
    
    
        def on_data(self, data):
            try:
                with open(self.outfile, 'a') as f:
                    f.write(data)
                    print(data)
                    return True
            except BaseException as e:
                print("Error on_data: %s" % str(e))
                time.sleep(5)
            return True
    
        def on_error(self, status):
            if status_code == 420:
                #returning False in on_data disconnects the stream
                print("rate limited - to many connection attempts. Please wait.")
                return False
            else:
                print(status)
            return True
    
    
    def format_filename(fname):
        """Convert file name into a safe string.
    
        Arguments:
            fname -- the file name to convert
        Return:
            String -- converted file name
        """
        return ''.join(convert_valid(one_char) for one_char in fname)
    
    
    def convert_valid(one_char):
        """Convert a character into '_' if invalid.
    
        Arguments:
            one_char -- the char to convert
        Return:
            Character -- converted char
        """
        valid_chars = "-_.%s%s" % (string.ascii_letters, string.digits)
        if one_char in valid_chars:
            return one_char
        else:
            return '_'
    
    @classmethod
    def parse(cls, api, raw):
        status = cls.first_parse(api, raw)
        setattr(status, 'json', json.dumps(raw))
        return status
    
    if __name__ == '__main__':
        parser = get_parser()
        args = parser.parse_args()
        auth = OAuthHandler(config.consumer_key, config.consumer_secret)
        auth.set_access_token(config.access_token, config.access_secret)
        api = tweepy.API(auth)
    
        twitter_stream = Stream(auth, MyListener(args.data_dir, args.query))
        twitter_stream.filter(track=[args.query], languages=[args.languages], async=False)
    

    Create an output dir first, and a file config.py

    consumer_key = "7r..."
    consumer_secret = "gp..."
    access_token = "5Q..."
    access_secret = "a3..."
    

    Then call it like this:

    python twitter_stream_download.py --query #Logistics" -d data
    
    Login or Signup to reply.
  3. I had this exact issue (and I needed to look for tweets that were more than a week old). And since existing packages were too slow I decided to create a small package called Twper. I think you might find it interesting. There’s an example in the Readme that solves ur exact issue.

    Disclaimer: I am the author of this package and it is relatively new but hopefully it’ll help.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search