skip to Main Content

I am using the Twitter API StreamingClient using the python module Tweepy. I am currently doing a short stream where I am collecting tweets and saving the entire ID and text from the tweet inside of a json object and writing it to a file.

My goal is to be able to collect the Twitter handle from each specific tweet and save it to a json file (preferably print it in the output terminal as well).

This is what the current code looks like:

KEY_FILE = './keys/bearer_token'
DURATION = 10

def on_data(json_data):
    json_obj = json.loads(json_data.decode())
    #print('Received tweet:', json_obj)
    print(f'Tweet Screen Name: {json_obj.user.screen_name}')
    with open('./collected_tweets/tweets.json', 'a') as out:
        json.dump(json_obj, out)

bearer_token = open(KEY_FILE).read().strip()
streaming_client = tweepy.StreamingClient(bearer_token)
streaming_client.on_data = on_data
streaming_client.sample(threaded=True)
time.sleep(DURATION)
streaming_client.disconnect()

And I have no idea how to do this, the only thing I found is that someone did this:

json_obj.user.screen_name

However, this did not work at all, and I am completely stuck.

2

Answers


  1. Chosen as BEST ANSWER
    KEY_FILE = './keys/bearer_token'
    DURATION = 10
    
    def on_data(json_data):
        json_obj = json.loads(json_data.decode())
        print('Received tweet:', json_obj)
        with open('./collected_tweets/tweets.json', 'a') as out:
            json.dump(json_obj, out)
    
    bearer_token = open(KEY_FILE).read().strip()
    streaming_client = tweepy.StreamingClient(bearer_token)
    streaming_client.on_data = on_data
    streaming_client.on_closed = on_finish
    streaming_client.sample(threaded=True, expansions="author_id", user_fields="username", tweet_fields="created_at")
    time.sleep(DURATION)
    streaming_client.disconnect()
    

  2. So a couple of things

    Firstly, I’d recommend using on_response rather than on_data because StreamClient already defines a on_data function to parse the json. (Then it will fire on_tweet, on_response, on_error, etc)

    Secondly, json_obj.user.screen_name is part of API v1 I believe, which is why it doesn’t work.


    To get extra data using Twitter Apiv2, you’ll want to use Expansions and Fields (Tweepy Documentation, Twitter Documentation)

    For your case, you’ll probably want to use "username" which is under the user_fields.

    def on_response(response:tweepy.StreamResponse):
        tweet:tweepy.Tweet = response.data
        users:list = response.includes.get("users") 
        # response.includes is a dictionary representing all the fields (user_fields, media_fields, etc)
        # response.includes["users"] is a list of `tweepy.User`
        # the first user in the list is the author (at least from what I've tested)
        # the rest of the users in that list are anyone who is mentioned in the tweet
        
        author_username = users and users[0].username
        print(tweet.text, author_username)
    
    streaming_client = tweepy.StreamingClient(bearer_token)
    streaming_client.on_response = on_response
    streaming_client.sample(threaded=True, user_fields = ["id", "name", "username"]) # using user fields 
    
    time.sleep(DURATION)
    streaming_client.disconnect()
    

    Hope this helped.

    also tweepy documentation definitely needs more examples for api v2

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search