skip to Main Content

I have a Python code that extracts Twitter data via the streaming API. I would like to use separate files for each day so I would like to have the script running for 24 hours, then kill it and restart it as with a restart of the program the name of the file will change.

How can I ensure that the script is stopped at 00:00 and restarts right away?
The code can be found below. If you have any other ideas about how I can create a new text file daily, this would be even better.

import tweepy
import datetime
key_words = ["xx"]
twitter_data_title = "".join([xx, "_", date_today, ".txt"])

class TwitterStreamer():

    def __init__(self):
        pass

    def stream_tweets(self, twitter_data_title, key_words):
        listener = StreamListener(twitter_data_title)
        auth = tweepy.OAuthHandler(api_key, api_secret_key)
        auth.set_access_token(access_token, access_secret_token)
        stream = tweepy.Stream(auth, listener)
        stream.filter(track=key_words)


class StreamListener(tweepy.StreamListener):

    def __init__(self, twitter_data_title):
        self.fetched_tweets_filename = twitter_data_title

    def on_data(self, data):
        try:
            print(data)
        
            with open(self.fetched_tweets_filename, 'a') as tf:
                tf.write(data)
            return True
        except BaseException as e:
            print("Error on_data %s" % str(e))
        return True
    
    def on_exception(self, exception):
        print('exception', exception)
        stream_tweets(twitter_data_title, key_words)    

    def on_error(self, status):
        print(status)
    
def stream_tweets(twitter_data_title, key_words):
    listener = StreamListener(twitter_data_title)
    auth = tweepy.OAuthHandler(api_key, api_secret_key)
    auth.set_access_token(access_token, access_secret_token)
    stream = tweepy.Stream(auth, listener)
    stream.filter(track=key_words)
    
    
if __name__ == '__main__':
    twitter_streamer = TwitterStreamer()
    twitter_streamer.stream_tweets(twitter_data_title, key_words)

2

Answers


  1. I would add this to your code:

    from threading import Timer
    
    def stopTheScript():
        exec(open("anotherscript.py").read())
        exit()
    
    Timer(86400, stopTheScript).start() #86400 s = 24 h
    
    Login or Signup to reply.
  2. It looks like the ‘blocking’ code in your example comes from another library, so you don’t have the opportunity to (easily) change the inner loop to check for a condition and exit.

    Using a Background Process (Not Ideal)

    You could change your entry point to start the code in a background process, and check to see if the file’s title should have changed:

    from multiprocessing import Process
    from time import sleep
    
    ...
    
    if __name__ == "__main__":
        twitter_streamer = TwitterStreamer() 
        twitter_data_title, process = None, None     
    
        while True:
            new_data_title = "".join([xx, "_", str(datetime.date.today()), ".txt"])
    
            if new_data_title == twitter_data_title:  # Nothing to do.
                sleep(60)  # Sleep for a minute
                continue  # And check again
    
            # Set the new title.
            twitter_data_title = new_data_title
    
            # If the process is already running, terminate and join it.
            if process is not None:
                process.terminate()
                process.join()
    
            process = Process(target=twitter_streamer.stream_tweets, args=[twitter_data_title, key_words])
            process.start()
    

    Changing StreamListener

    A better alternative would probably be to encode the knowledge of the date into StreamListener. Instead of passing a file name (twitter_data_title), pass a file prefix (xx from your example), and build the filename in a property:

    ...
    
    class StreamListener(tweepy.StreamListener):
    
        def __init__(self, file_prefix):
            self.prefix = file_prefix
    
        @property
        def fetched_tweets_filename(self):
            """The file name for the tweets."""
            date = datetime.date.today()
            return f"{self.prefix}_{date}.txt"
    
        ...
    
    ...
    
    if __name__ == "__main__":
        twitter_streamer = TwitterStreamer()
        twitter_streamer.stream_tweets(xx, key_words)
    
    
    

    Since StreamListener.on_data grabs the file name from self.fetched_tweets_filename, this should mean the tweets are written to the new file when the date changes.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search