I am trying to get to learn programing by replicating the code from a tutorial on the web.
I am puzzled as to why the sending request command in line 21 works and the sending command in line 74 does not work, when is using the same parameters, endpoint and headers.
i’ve tried many options but none of them seem to work, I’ve spent many hours and I am going a bit crazy.
##Update11#####
i’ve commented lines where the start_time and end_time attributes get added, and it seems to work, however I need to set the start time and end time.
Once I commented out the two lines listed below it seems to work, however, I need to stop it in case it uses my entire month allowance.
params['start_time'] = pre60
params['end_time'] = now
Any idea why adding the start_time and end_time to the parameters is not working?
####Update 2######
If I put a debug point in the response command withing the while loop it intermetinly works. Why would adding the break point would make it work some times?
I also added a sleep line before the response, but that doesn’t work.
Any help would be much appreciated.
Many thanks.
from datetime import datetime, timedelta
import requests
import pandas as pd
import re
#from main import get_data,jprint
# Twitter API configuration parameters
# setup the API request
endpoint = 'https://api.twitter.com/2/tweets/search/recent'
params = {'max_results': '1',
'query': '(tesla OR tsla OR elon musk) (lang:en)',
'tweet.fields':'created_at,lang'
}
token = ‘asdjklsfdClkjwe23….example'
headers = {'Authorization': "Bearer {}".format(token)}
#Line 20## WHY THIS ONE WORKS? Line 20
response = requests.get(endpoint, headers=headers, params=params) # send the request
print(response.status_code) # this one returns 200
dtformat = '%Y-%m-%dT%H:%M:%SZ' # the date format string required by twitter
def get_data(tweet):
clean_up_tweet(tweet['text'])
data = {
'id': tweet['id'],
'created_at': tweet['created_at'],
'text': tweet['text']
}
return data
def clean_up_tweet(tweet):
# Cleanup text from tweets
whitespace = re.compile(r"s+")
web_address = re.compile(r"(?i)http(s)://[a-z0-9.~_-/]+")
tesla = re.compile(r"(?i)@Tesla(?=b)")
user = re.compile(r"(?i)@[a-z0-9_]+")
# we then use the sub method to replace anything matching
tweet = whitespace.sub(' ', tweet)
tweet = web_address.sub('', tweet)
tweet = tesla.sub('Tesla', tweet)
tweet = user.sub('', tweet)
# we use this function to subtract 60 mins from our datetime string
def time_travel(now, mins):
now = datetime.strptime(now, dtformat)
back_in_time = now - timedelta(minutes=mins)
return back_in_time.strftime(dtformat)
now = datetime.now() # get the current datetime, this is our starting point
last_week = now - timedelta(days=7) # datetime one week ago = the finish line
now = now.strftime(dtformat) # convert now datetime to format for API
df = pd.DataFrame() # initialize dataframe to store tweets
while True:
if datetime.strptime(now, dtformat) < last_week:
# if we have reached 7 days ago, break the loop
break
pre60 = time_travel(now, 60) # get 60 minutes before 'now'
# assign from and to datetime parameters for the API
params['start_time'] = pre60
params['end_time'] = now
#Line 73##WHY THIS ONE DOES NOT WORK?
response = requests.get(endpoint, headers=headers, params=params) # send the request
print(response.status_code) #this one returns 400
now = pre60 # move the window 60 minutes earlier
# iteratively append our tweet data to our dataframe
for tweet in response.json()['data']:
row = get_data(tweet) # we defined this function earlier
df = df.append(row, ignore_index=True)
2
Answers
It turns out that the problem was that Twitter API does not allow to query for twitts posted now. The solution was to request for twitts posted 10 seconds from now
Not sure if this would help you directly, but try using a library instead of direct twitter API’s as this would ease complications on your part.
One such library for python is tweepy: https://docs.tweepy.org/en/stable/api.html
Also if you want to extract large number of tweets, then you won’t get entire result in one single response. You will have to follow cursor method.
Attached is sample code for the same