How to get more details on streaming data using Tweepy StreamingClient of Twitter APIv2

AKMalkadi
January 22, 2023
172 views
0 votes
2 Answers

I used tweepy for v1.1 Twitter streaming to follow a Twitter account. When someone is tweeting this user for any tweet (let say download this video bot), I get a lot of details about the tweet that the user mentioned and the tweet that has the video. (tweet info, video links etc)

used to be something like this:

class StdOutListener(tweepy.Stream):

    def on_data(self, data):
        # process stream data here
        struct = json.loads(data)

struct has a lot of data in form of json

In API v2, I have the following:

class StdOutListener(tweepy.StreamingClient):

    def on_tweet(self, tweet):
        print(tweet)
        print(tweet.data)
        print(tweet.entities)
        print(f"{tweet.id} {tweet.created_at} ({tweet.author_id}): {tweet.text}")

I am getting just few info as follows:

INFO:tweepy.streaming:Stream connected
@2hvQqjddfgfdUY96Ah5yW @7bdsfdsfds3h_bot
{'edit_history_tweet_ids': ['1617291424910688874'], 'id': '1617291424910688874', 'text': '@2hvQqjddfgfdUY96Ah5yW @7bdsfdsfds3h_bot'}
None
1617291424910688874 None (None): @2hvQqjddfgfdUY96Ah5yW @7bdsfdsfds3h_bot

How can I get all the data as it was in v1.1?

The rest of my code is like this:

printer = StdOutListener(bearer_token)

# add new rules    
rule = StreamRule(value="@7bdsfdsfds3h_bot")
printer.add_rules(rule)

printer.filter()

Answers

Chosen as BEST ANSWER

I was able to get more details by using tweet_fields and expansions.

Here is an example,

#...
#...
    printer.add_rules(rule)

    tweet_fields=['context_annotations', "referenced_tweets","lang","author_id","created_at","entities"]

    printer.filter(tweet_fields=tweet_fields,expansions=["referenced_tweets.id","in_reply_to_user_id"])

Then from the event on_data I was able to get the needed info as follows:

class StdOutListener(tweepy.StreamingClient):
    def on_data(self, data):
        print(data)

        struct = json.loads(data)
        text = struct['includes']['tweets'][1]['text']
        # get referenced_tweets info (of the second tweet)
        # the tweet that my account was mentioned to as a reply
        ref_user_screen = struct['includes']['users'][1]['name']
        ref_username = struct['includes']['users'][1]['username']
        ref_id = struct['includes']['users'][1]['id']

        # get info about the user who mentioned my account
        caller_username = struct['includes']['users'][0]['username']
        caller_user_screen = struct['includes']['users'][0]['name']

        tweet_id = struct['data']['id']

(Edit)

You can find all available fields listed in the postman example request referenced in the Tweeter’s Filtered Stream documentation:

https://www.postman.com/twitter/workspace/twitter-s-public-workspace/request/9956214-977c147d-0462-4553-adfa-d7a1fe59c3ec

So the full blown tweepy code will look like this:

from tweepy import StreamingClient

bearer_token = "xxx"
client = StreamingClient(bearer_token)

client.filter(
expansions="attachments.poll_ids,attachments.media_keys,author_id,geo.place_id,in_reply_to_user_id,referenced_tweets.id,entities.mentions.username,referenced_tweets.id.author_id",
tweet_fields="attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,possibly_sensitive,public_metrics,referenced_tweets,reply_settings,source,text,withheld,edit_history_tweet_ids,edit_controls",
poll_fields="duration_minutes,end_datetime,id,options,voting_status",
place_fields="contained_within,country,country_code,full_name,geo,id,name,place_type",
user_fields="created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld",
media_fields="duration_ms,height,media_key,preview_image_url,public_metrics,type,url,width"
)

Please signup or login to give your own answer.

Click here to cancel reply.