skip to Main Content

I have a running code using tweepy’s stream listener to stream tweets. It works just fine and I have run it a couple of times successfully, both using arabic, English, and French keywords combined.

For some reason, when I insert my whole set of keywords (397) the code results in the error reading

SyntaxError: Non-UTF-8 code starting with 'xd9' in file twitter_streaming_copy.py on line 67, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

Quite oddly, I have tried to run the code using different parts of the set of keywords and it works fine, it is only when I put them all together that is stops working. Any idea? Here is my code: (I’m using python 3)

# Chap02-03/twitter_streaming.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys 
import string 
import time
import tweepy
from tweepy import Stream 
from tweepy.streaming import StreamListener
 consumer_key = ".."
 consumer_secret = ".."
 access_key = ".-."
 access_secret = ".."


 class CustomListener(StreamListener): 
  """Custom StreamListener for streaming Twitter data.""" 

   def __init__(self, fname):
     safe_fname = format_filename(fname) 
     self.outfile = "stream_%s.jsonl" % safe_fname 

   def on_data(self, data): 
     try: 
       with open(self.outfile, 'a') as f: 
         f.write(data) 
         return True 
     except BaseException as e: 
       sys.stderr.write("Error on_data: {}n".format(e)) 
       time.sleep(5) 
     return True 

   def on_error(self, status): 
     if status == 420: 
       sys.stderr.write("Rate limit exceededn") 
       return False 
     else: 
       sys.stderr.write("Error {}n".format(status)) 
       return True 

 def format_filename(fname): 
  """Convert fname into a safe string for a file name. 

   Return: string 
  """ 
   return ''.join(convert_valid(one_char) for one_char in fname) 

 def convert_valid(one_char): 
  """Convert a character into '_' if "invalid". 

   Return: string 
  """ 
   valid_chars = "-_.%s%s" % (string.ascii_letters, string.digits) 
   if one_char in valid_chars: 
     return one_char 
   else: 
     return '_' 

 if __name__ == '__main__': 
   query = sys.argv[1:] # list of CLI arguments 
   query_fname = ' '.join(query) # string 
   auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
   auth.set_access_token(access_key, access_secret)
   api = tweepy.API(auth)
   twitter_stream = Stream(auth, CustomListener(query_fname)) 
   twitter_stream.filter(track=['saudi لبنان', 'iran لبنان', 'iran lebanon', 'ايران لبنان', 'hezbollah lebanon', 'حزب الله لبنان', 'saoudite liban', 'iran liban', 'hezbollah liban'], async=True)

2

Answers


  1. You haven’t saved your source file as UTF-8. Configure your editor correctly.

    Alternatively, adjust your coding comment at the top; the default for Python 3 is UTF-8 but if you used a different codec you need to specify it in that comment. However,tThe encoding comment should appear in the first two lines of your file. You have it set on the third line. Quoting from the PEP linked in the error message:

    To define a source code encoding, a magic comment must be placed into the source files either as first or second line in the file[.]

    (Bold emphasis mine)

    Re-arrange your comments to:

    #!/usr/bin/env python
    # -*- coding: <your codec> -*-
    # Chap02-03/twitter_streaming.py
    

    I moved the first comment down; the #! line must be the first line in the file for it to work. You could also just remove it altogether, since you were not using it.

    Login or Signup to reply.
  2. I reproduced a similar error with the following code by saving the file as Windows-1256 (Arabic):

    # Chap02-03/twitter_streaming.py
    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    s = ['saudi لبنان', 'iran لبنان', 'iran lebanon', 'ايران لبنان', 'hezbollah lebanon', 'حزب الله لبنان', 'saoudite liban', 'iran liban', 'hezbollah liban']
    

    Output:

      File "C:test.py", line 4
    SyntaxError: Non-UTF-8 code starting with 'xe1' in file C:test.py on line 4, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
    

    @Martijn’s answer is correct that the coding line must be in the first two lines, but UTF-8 is the default encoding in Python 3 anyway. If the file was saved in UTF-8, it would have worked even with the comment on the wrong line, but the file must also be saved in the declared encoding.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search