skip to Main Content

My goal is to extract the media data from a tweet. I’m using twitter api-v2, and when I extract less than 100 tweets I have no problems, but when I use Paginator, I get an error telling me that

users = {u["id"]: u for u in tweets.includes['users']}
AttributeError: 'Paginator' object has no attribute 'includes'. 

And I have not been able to change the code to extract the media data. Also, I don’t know if there is another way to have this data. Any help would be appreciated!

client = tweepy.Client(bearer_token=(config.BEARER_TOKEN))

query = 'climate change -is:retweet has:media'

# your start and end time for fetching tweets
start_time = '2020-01-01T00:00:00Z'
end_time = '2020-01-31T00:00:00Z'

# get tweets from the API
tweets = tweepy.Paginator(client.search_all_tweets,
                          query=query,
                          start_time=start_time,
                          end_time=end_time,
                          tweet_fields=['context_annotations', 'created_at','source','public_metrics',
                                                'lang','referenced_tweets','reply_settings','conversation_id',
                                                'in_reply_to_user_id','geo'],
                          expansions=['attachments.media_keys','author_id','geo.place_id'],
                          media_fields=['preview_image_url','type','public_metrics','url'],
                          place_fields=['place_type', 'geo'],
                          user_fields=['name', 'username', 'location', 'verified', 'description',
                                               'profile_image_url','entities'],
                          max_results=100)

# Get users, media, place list from the includes object
users = {u["id"]: u for u in tweets.includes['users']}
media = {m["media_key"]: m for m in tweets.includes['media']}
# places = {p["id"]: p for p in tweets.includes['places']}

# create a list of records
tweet_info_ls = []
# iterate over each tweet and corresponding user details
for tweet in tweets.data:
    # metrics = tweet.organic_metrics
    # User Metadata
    user = users[tweet.author_id]
    # Media files
    attachments = tweet.data['attachments']
    media_keys = attachments['media_keys']
    link_image = media[media_keys[0]].preview_image_url
    url_image = media[media_keys[0]].url
    link_type = media[media_keys[0]].type
    link_public_metrics = media[media_keys[0]].public_metrics
    # Public metrics
    public_metrics = tweet.data['public_metrics']
    retweet_count = public_metrics['retweet_count']
    reply_count = public_metrics['reply_count']
    like_count = public_metrics['like_count']
    quote_count = public_metrics['quote_count']
    tweet_info = {
        'id': tweet.id,
        'author_id': tweet.author_id,
        'lang': tweet.lang,
        'geo': tweet.geo,
        # 'tweet_entities': metrics,
        'referenced_tweets': tweet.referenced_tweets,
        'reply_settings': tweet.reply_settings,
        'created_at': tweet.created_at,
        'text': tweet.text,
        'source': tweet.source,
        'retweet_count': retweet_count,
        'reply_count': reply_count,
        'like_count': like_count,
        'quote_count': quote_count,
        'name': user.name,
        'username': user.username,
        'location': user.location,
        'verified': user.verified,
        'description': user.description,
        'entities': user.entities,
        'profile_image': user.profile_image_url,
        'media_keys': link_image,
        'type': link_type,
        'link_public_metrics': link_public_metrics,
        'url_image': url_image
    }
    tweet_info_ls.append(tweet_info)

# create dataframe from the extracted records
df = pd.DataFrame(tweet_info_ls)

2

Answers


  1. You have to iterate through the pages to get the includes (and the data by the way).

    paginator = tweepy.Paginator(client.search_all_tweets, [...])
    
    for page in paginator:
    
       print(page.data)      # The tweets in that page
       print(page.includes)  # The includes in that page
    
    Login or Signup to reply.
  2. How do I access includes data while using Paginator?
    Paginator.flatten() flattens the data and iterates over each object.

    To access includes, you’ll need to iterate through each response instead.
    From https://docs.tweepy.org/en/stable/faq.html

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search