skip to Main Content

Trying to put the last 24hrs of data into a CSV file and getting using tweepy for python

Traceback (most recent call last):
File "**", line 74, in <module>
get_all_tweets("BQ")
File "**", line 66, in get_all_tweets
writer.writerows(outtweets)
File "C:UsersBarryAppDataLocalProgramsPythonPython35-32libencodingscp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 6-7: character maps to <undefined>

as an error, can anyone see what is wrong as this was working in some capacity before today.

Code:
def get_all_tweets(screen_name):

# authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(consumer_token, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)

# initialize a list to hold all the tweepy Tweets
alltweets = []    

# make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.home_timeline (screen_name=screen_name, count=200)

# save most recent tweets
alltweets.extend(new_tweets)

# save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1

outtweets = []

page = 1
deadend = False


print ("getting tweets before %s" % (oldest))

# all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.home_timeline(screen_name=screen_name, count=200, max_id=oldest, page=page)

# save most recent tweets
alltweets.extend(new_tweets)

# update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1

print ("...%s tweets downloaded so far" % (len(alltweets)))

for tweet in alltweets:

    if (datetime.datetime.now() - tweet.created_at).days < 1:
        # transform the tweepy tweets into a 2D array that will populate the csv    
        outtweets.append([tweet.user.name, tweet.created_at, tweet.text.encode("utf-8")])

    else:
        deadend = True
        return
    if not deadend:
        page += 1

# write the csv    
with open('%s_tweets.csv' % screen_name, 'w') as f:
    writer = csv.writer(f)
    writer.writerow(["name", "created_at", "text"])
    writer.writerows(outtweets)
pass


print ("CSV written")

if __name__ == '__main__':
# pass in the username of the account you want to download
get_all_tweets("BQ")

** EDIT 1 **

 with open('%s_tweets.csv' % screen_name, 'w', encode('utf-8')) as f:
 TypeError: an integer is required (got type bytes)

** EDIT 2**

 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
 UnicodeEncodeError: 'charmap' codec can't encode characters in position 6-7: character maps to <undefined>

2

Answers


  1. It seems that the character is something that cannot be encoded into utf-8. While it may be useful to view the tweet in question that triggered the error, you can prevent such an error in the future by changing tweet.text.encode("utf-8") to either tweet.text.encode("utf-8", "ignore"), tweet.text.encode("utf-8", "replace"), or tweet.text.encode("utf-8", "backslashreplace"). ignore removes anything that cannot be encoded; replace replaces the infringing character with ufff; and backslashreplace adds a backslash to the infringing character x00 would become \x00.

    For more on this: https://docs.python.org/3/howto/unicode.html#converting-to-bytes

    Login or Signup to reply.
  2. Your problem is with the characters in some tweets. You’re not able to write them to the file you open.
    If you replace this line

    with open('%s_tweets.csv' % screen_name, 'w') as f:
    

    with this:

    with open('%s_tweets.csv' % screen_name, mode='w', encoding='utf-8') as f:
    

    it should work. Please note that this will only work with python 3.x

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search