Twitter API request failing when created different ways

Amra
February 13, 2022
54 views
0 votes
2 Answers

I am trying to get to learn programing by replicating the code from a tutorial on the web.

I am puzzled as to why the sending request command in line 21 works and the sending command in line 74 does not work, when is using the same parameters, endpoint and headers.

i’ve tried many options but none of them seem to work, I’ve spent many hours and I am going a bit crazy.

##Update11#####
i’ve commented lines where the start_time and end_time attributes get added, and it seems to work, however I need to set the start time and end time.

Once I commented out the two lines listed below it seems to work, however, I need to stop it in case it uses my entire month allowance.

params['start_time'] = pre60

params['end_time'] = now

Any idea why adding the start_time and end_time to the parameters is not working?

####Update 2######
If I put a debug point in the response command withing the while loop it intermetinly works. Why would adding the break point would make it work some times?

I also added a sleep line before the response, but that doesn’t work.

Any help would be much appreciated.

Many thanks.

from datetime import datetime, timedelta
import requests
import pandas as pd
import re
#from main import get_data,jprint


# Twitter API configuration parameters

# setup the API request
endpoint = 'https://api.twitter.com/2/tweets/search/recent'
params = {'max_results': '1',
          'query':  '(tesla OR tsla OR elon musk) (lang:en)',
          'tweet.fields':'created_at,lang'
          }
token = ‘asdjklsfdClkjwe23….example'
headers = {'Authorization': "Bearer {}".format(token)}

#Line 20## WHY THIS ONE WORKS? Line 20
response = requests.get(endpoint, headers=headers, params=params)  # send the request
print(response.status_code)  # this one returns 200

dtformat = '%Y-%m-%dT%H:%M:%SZ'  # the date format string required by twitter



def get_data(tweet):
    clean_up_tweet(tweet['text'])
    data = {
        'id': tweet['id'],
        'created_at': tweet['created_at'],
        'text': tweet['text']
    }
    return data


def clean_up_tweet(tweet):
    # Cleanup text from tweets
    whitespace = re.compile(r"s+")
    web_address = re.compile(r"(?i)http(s)://[a-z0-9.~_-/]+")
    tesla = re.compile(r"(?i)@Tesla(?=b)")
    user = re.compile(r"(?i)@[a-z0-9_]+")

    # we then use the sub method to replace anything matching
    tweet = whitespace.sub(' ', tweet)
    tweet = web_address.sub('', tweet)
    tweet = tesla.sub('Tesla', tweet)
    tweet = user.sub('', tweet)

# we use this function to subtract 60 mins from our datetime string
def time_travel(now, mins):
    now = datetime.strptime(now, dtformat)
    back_in_time = now - timedelta(minutes=mins)
    return back_in_time.strftime(dtformat)


now = datetime.now()  # get the current datetime, this is our starting point
last_week = now - timedelta(days=7)  # datetime one week ago = the finish line
now = now.strftime(dtformat)  # convert now datetime to format for API

df = pd.DataFrame()  # initialize dataframe to store tweets

while True:
    if datetime.strptime(now, dtformat) < last_week:
        # if we have reached 7 days ago, break the loop
        break
    pre60 = time_travel(now, 60)  # get 60 minutes before 'now'
    # assign from and to datetime parameters for the API
    params['start_time'] = pre60
    params['end_time'] = now
    
    #Line 73##WHY THIS ONE DOES NOT WORK?
    response = requests.get(endpoint, headers=headers, params=params) # send the request
    print(response.status_code) #this one returns 400
    now = pre60  # move the window 60 minutes earlier
    # iteratively append our tweet data to our dataframe

    for tweet in response.json()['data']:
        row = get_data(tweet)  # we defined this function earlier
        df = df.append(row, ignore_index=True)

Tags: connection python

Answers

Chosen as BEST ANSWER
- Amra
- February 15, 2022 at 8:18 pm
- 0 votes
0
It turns out that the problem was that Twitter API does not allow to query for twitts posted now. The solution was to request for twitts posted 10 seconds from now
```
now = datetime.now()- timedelta(seconds=10)
```

(Edit)

Not sure if this would help you directly, but try using a library instead of direct twitter API’s as this would ease complications on your part.

One such library for python is tweepy: https://docs.tweepy.org/en/stable/api.html

Also if you want to extract large number of tweets, then you won’t get entire result in one single response. You will have to follow cursor method.

Attached is sample code for the same

import tweepy
auth = tweepy.OAuth2BearerHandler("YourKey")
api = tweepy.API(auth)
user = api.get_user(screen_name="UserYouWantToUse", include_entities=False)

tweets_df = pd.DataFrame(
    columns=['id', 'text', 'retweet_count', 'favorite_count', 'created_at'])

start_date_tweet = datetime.datetime.now() # this should newer date
end_date_tweet = datetime.datetime(2019, 1, 1)  # this should be older date
twitter_date_format = '%a %b %d %X %z %Y'


def addTweetsInDataFrame(max_id = -1):
    if (max_id == -1):   # This is the first API call, max_id is not required
        tweet_timeline = api.user_timeline(
            user_id=user.id, count=200, exclude_replies=True, trim_user=True, include_rts=False)
    else:   # Cursoring starts, need to pass max_id parameter as well
        tweet_timeline = api.user_timeline(
            user_id=user.id, count=200, exclude_replies=True, trim_user=True, include_rts=False, max_id = max_id)
    
    max_id_track = math.inf

    for tweet in tweet_timeline:

        tweet_date = datetime.datetime.strptime(tweet._json['created_at'], twitter_date_format)
        tweet_date = tweet_date.replace(tzinfo=None)    # to remove timezone from datetime object

        if (tweet_date > start_date_tweet): # This tweet is newer than required start date
            continue
            
        if (tweet_date < end_date_tweet):  # crossed the older date limit, all other post would be older than this date
            return

        tweet_dict = dict()
        tweet_dict['id'] = tweet._json['id']
        tweet_dict['text'] = tweet._json['text']
        tweet_dict['retweet_count'] = tweet._json['retweet_count']
        tweet_dict['favorite_count'] = tweet._json['favorite_count']
        tweet_dict['created_at'] = tweet_date
        
        tweets_df.loc[len(tweets_df.index)] = tweet_dict
        
        max_id_track = min(max_id_track, tweet._json['id'])


    
    if (max_id_track != math.inf):  # Cursor is available for next set of data
        time.sleep(2)               # To avoid hitting API limit
        addTweetsInDataFrame(max_id_track-1)

addTweetsInDataFrame()
tweets_df.to_excel("tweets_df.xlsx", index=False) # Saving in excel file

Please signup or login to give your own answer.

Click here to cancel reply.