skip to Main Content

I am trying to get to learn programing by replicating the code from a tutorial on the web.

I am puzzled as to why the sending request command in line 21 works and the sending command in line 74 does not work, when is using the same parameters, endpoint and headers.

i’ve tried many options but none of them seem to work, I’ve spent many hours and I am going a bit crazy.

##Update11#####
i’ve commented lines where the start_time and end_time attributes get added, and it seems to work, however I need to set the start time and end time.

Once I commented out the two lines listed below it seems to work, however, I need to stop it in case it uses my entire month allowance.

params['start_time'] = pre60

params['end_time'] = now

Any idea why adding the start_time and end_time to the parameters is not working?

####Update 2######
If I put a debug point in the response command withing the while loop it intermetinly works. Why would adding the break point would make it work some times?

I also added a sleep line before the response, but that doesn’t work.

Any help would be much appreciated.

Many thanks.

from datetime import datetime, timedelta
import requests
import pandas as pd
import re
#from main import get_data,jprint


# Twitter API configuration parameters

# setup the API request
endpoint = 'https://api.twitter.com/2/tweets/search/recent'
params = {'max_results': '1',
          'query':  '(tesla OR tsla OR elon musk) (lang:en)',
          'tweet.fields':'created_at,lang'
          }
token = ‘asdjklsfdClkjwe23….example'
headers = {'Authorization': "Bearer {}".format(token)}

#Line 20## WHY THIS ONE WORKS? Line 20
response = requests.get(endpoint, headers=headers, params=params)  # send the request
print(response.status_code)  # this one returns 200

dtformat = '%Y-%m-%dT%H:%M:%SZ'  # the date format string required by twitter



def get_data(tweet):
    clean_up_tweet(tweet['text'])
    data = {
        'id': tweet['id'],
        'created_at': tweet['created_at'],
        'text': tweet['text']
    }
    return data


def clean_up_tweet(tweet):
    # Cleanup text from tweets
    whitespace = re.compile(r"s+")
    web_address = re.compile(r"(?i)http(s)://[a-z0-9.~_-/]+")
    tesla = re.compile(r"(?i)@Tesla(?=b)")
    user = re.compile(r"(?i)@[a-z0-9_]+")

    # we then use the sub method to replace anything matching
    tweet = whitespace.sub(' ', tweet)
    tweet = web_address.sub('', tweet)
    tweet = tesla.sub('Tesla', tweet)
    tweet = user.sub('', tweet)

# we use this function to subtract 60 mins from our datetime string
def time_travel(now, mins):
    now = datetime.strptime(now, dtformat)
    back_in_time = now - timedelta(minutes=mins)
    return back_in_time.strftime(dtformat)


now = datetime.now()  # get the current datetime, this is our starting point
last_week = now - timedelta(days=7)  # datetime one week ago = the finish line
now = now.strftime(dtformat)  # convert now datetime to format for API

df = pd.DataFrame()  # initialize dataframe to store tweets

while True:
    if datetime.strptime(now, dtformat) < last_week:
        # if we have reached 7 days ago, break the loop
        break
    pre60 = time_travel(now, 60)  # get 60 minutes before 'now'
    # assign from and to datetime parameters for the API
    params['start_time'] = pre60
    params['end_time'] = now
    
    #Line 73##WHY THIS ONE DOES NOT WORK?
    response = requests.get(endpoint, headers=headers, params=params) # send the request
    print(response.status_code) #this one returns 400
    now = pre60  # move the window 60 minutes earlier
    # iteratively append our tweet data to our dataframe

    for tweet in response.json()['data']:
        row = get_data(tweet)  # we defined this function earlier
        df = df.append(row, ignore_index=True)

2

Answers


  1. Chosen as BEST ANSWER

    It turns out that the problem was that Twitter API does not allow to query for twitts posted now. The solution was to request for twitts posted 10 seconds from now

    now = datetime.now()- timedelta(seconds=10)
    

  2. Not sure if this would help you directly, but try using a library instead of direct twitter API’s as this would ease complications on your part.

    One such library for python is tweepy: https://docs.tweepy.org/en/stable/api.html

    Also if you want to extract large number of tweets, then you won’t get entire result in one single response. You will have to follow cursor method.

    Attached is sample code for the same

    import tweepy
    auth = tweepy.OAuth2BearerHandler("YourKey")
    api = tweepy.API(auth)
    user = api.get_user(screen_name="UserYouWantToUse", include_entities=False)
    
    tweets_df = pd.DataFrame(
        columns=['id', 'text', 'retweet_count', 'favorite_count', 'created_at'])
    
    start_date_tweet = datetime.datetime.now() # this should newer date
    end_date_tweet = datetime.datetime(2019, 1, 1)  # this should be older date
    twitter_date_format = '%a %b %d %X %z %Y'
    
    
    def addTweetsInDataFrame(max_id = -1):
        if (max_id == -1):   # This is the first API call, max_id is not required
            tweet_timeline = api.user_timeline(
                user_id=user.id, count=200, exclude_replies=True, trim_user=True, include_rts=False)
        else:   # Cursoring starts, need to pass max_id parameter as well
            tweet_timeline = api.user_timeline(
                user_id=user.id, count=200, exclude_replies=True, trim_user=True, include_rts=False, max_id = max_id)
        
        max_id_track = math.inf
    
        for tweet in tweet_timeline:
    
            tweet_date = datetime.datetime.strptime(tweet._json['created_at'], twitter_date_format)
            tweet_date = tweet_date.replace(tzinfo=None)    # to remove timezone from datetime object
    
            if (tweet_date > start_date_tweet): # This tweet is newer than required start date
                continue
                
            if (tweet_date < end_date_tweet):  # crossed the older date limit, all other post would be older than this date
                return
    
            tweet_dict = dict()
            tweet_dict['id'] = tweet._json['id']
            tweet_dict['text'] = tweet._json['text']
            tweet_dict['retweet_count'] = tweet._json['retweet_count']
            tweet_dict['favorite_count'] = tweet._json['favorite_count']
            tweet_dict['created_at'] = tweet_date
            
            tweets_df.loc[len(tweets_df.index)] = tweet_dict
            
            max_id_track = min(max_id_track, tweet._json['id'])
    
    
        
        if (max_id_track != math.inf):  # Cursor is available for next set of data
            time.sleep(2)               # To avoid hitting API limit
            addTweetsInDataFrame(max_id_track-1)
    
    addTweetsInDataFrame()
    tweets_df.to_excel("tweets_df.xlsx", index=False) # Saving in excel file
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search