Trying to put the last 24hrs of data into a CSV file and getting using tweepy for python
Traceback (most recent call last):
File "**", line 74, in <module>
get_all_tweets("BQ")
File "**", line 66, in get_all_tweets
writer.writerows(outtweets)
File "C:UsersBarryAppDataLocalProgramsPythonPython35-32libencodingscp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 6-7: character maps to <undefined>
as an error, can anyone see what is wrong as this was working in some capacity before today.
Code:
def get_all_tweets(screen_name):
# authorize twitter, initialize tweepy
auth = tweepy.OAuthHandler(consumer_token, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth, wait_on_rate_limit=True)
# initialize a list to hold all the tweepy Tweets
alltweets = []
# make initial request for most recent tweets (200 is the maximum allowed count)
new_tweets = api.home_timeline (screen_name=screen_name, count=200)
# save most recent tweets
alltweets.extend(new_tweets)
# save the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
outtweets = []
page = 1
deadend = False
print ("getting tweets before %s" % (oldest))
# all subsiquent requests use the max_id param to prevent duplicates
new_tweets = api.home_timeline(screen_name=screen_name, count=200, max_id=oldest, page=page)
# save most recent tweets
alltweets.extend(new_tweets)
# update the id of the oldest tweet less one
oldest = alltweets[-1].id - 1
print ("...%s tweets downloaded so far" % (len(alltweets)))
for tweet in alltweets:
if (datetime.datetime.now() - tweet.created_at).days < 1:
# transform the tweepy tweets into a 2D array that will populate the csv
outtweets.append([tweet.user.name, tweet.created_at, tweet.text.encode("utf-8")])
else:
deadend = True
return
if not deadend:
page += 1
# write the csv
with open('%s_tweets.csv' % screen_name, 'w') as f:
writer = csv.writer(f)
writer.writerow(["name", "created_at", "text"])
writer.writerows(outtweets)
pass
print ("CSV written")
if __name__ == '__main__':
# pass in the username of the account you want to download
get_all_tweets("BQ")
** EDIT 1 **
with open('%s_tweets.csv' % screen_name, 'w', encode('utf-8')) as f:
TypeError: an integer is required (got type bytes)
** EDIT 2**
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 6-7: character maps to <undefined>
2
Answers
It seems that the character is something that cannot be encoded into utf-8. While it may be useful to view the tweet in question that triggered the error, you can prevent such an error in the future by changing
tweet.text.encode("utf-8")
to eithertweet.text.encode("utf-8", "ignore")
,tweet.text.encode("utf-8", "replace")
, ortweet.text.encode("utf-8", "backslashreplace")
.ignore
removes anything that cannot be encoded;replace
replaces the infringing character withufff
; andbackslashreplace
adds a backslash to the infringing characterx00
would become\x00
.For more on this: https://docs.python.org/3/howto/unicode.html#converting-to-bytes
Your problem is with the characters in some tweets. You’re not able to write them to the file you open.
If you replace this line
with this:
it should work. Please note that this will only work with python 3.x