I have a running code using tweepy’s stream listener to stream tweets. It works just fine and I have run it a couple of times successfully, both using arabic, English, and French keywords combined.
For some reason, when I insert my whole set of keywords (397) the code results in the error reading
SyntaxError: Non-UTF-8 code starting with 'xd9' in file twitter_streaming_copy.py on line 67, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
Quite oddly, I have tried to run the code using different parts of the set of keywords and it works fine, it is only when I put them all together that is stops working. Any idea? Here is my code: (I’m using python 3)
# Chap02-03/twitter_streaming.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
import string
import time
import tweepy
from tweepy import Stream
from tweepy.streaming import StreamListener
consumer_key = ".."
consumer_secret = ".."
access_key = ".-."
access_secret = ".."
class CustomListener(StreamListener):
"""Custom StreamListener for streaming Twitter data."""
def __init__(self, fname):
safe_fname = format_filename(fname)
self.outfile = "stream_%s.jsonl" % safe_fname
def on_data(self, data):
try:
with open(self.outfile, 'a') as f:
f.write(data)
return True
except BaseException as e:
sys.stderr.write("Error on_data: {}n".format(e))
time.sleep(5)
return True
def on_error(self, status):
if status == 420:
sys.stderr.write("Rate limit exceededn")
return False
else:
sys.stderr.write("Error {}n".format(status))
return True
def format_filename(fname):
"""Convert fname into a safe string for a file name.
Return: string
"""
return ''.join(convert_valid(one_char) for one_char in fname)
def convert_valid(one_char):
"""Convert a character into '_' if "invalid".
Return: string
"""
valid_chars = "-_.%s%s" % (string.ascii_letters, string.digits)
if one_char in valid_chars:
return one_char
else:
return '_'
if __name__ == '__main__':
query = sys.argv[1:] # list of CLI arguments
query_fname = ' '.join(query) # string
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
twitter_stream = Stream(auth, CustomListener(query_fname))
twitter_stream.filter(track=['saudi لبنان', 'iran لبنان', 'iran lebanon', 'ايران لبنان', 'hezbollah lebanon', 'حزب الله لبنان', 'saoudite liban', 'iran liban', 'hezbollah liban'], async=True)
2
Answers
You haven’t saved your source file as UTF-8. Configure your editor correctly.
Alternatively, adjust your coding comment at the top; the default for Python 3 is UTF-8 but if you used a different codec you need to specify it in that comment. However,tThe encoding comment should appear in the first two lines of your file. You have it set on the third line. Quoting from the PEP linked in the error message:
(Bold emphasis mine)
Re-arrange your comments to:
I moved the first comment down; the
#!
line must be the first line in the file for it to work. You could also just remove it altogether, since you were not using it.I reproduced a similar error with the following code by saving the file as
Windows-1256
(Arabic):Output:
@Martijn’s answer is correct that the
coding
line must be in the first two lines, but UTF-8 is the default encoding in Python 3 anyway. If the file was saved in UTF-8, it would have worked even with the comment on the wrong line, but the file must also be saved in the declared encoding.