skip to Main Content

I’m writing a simple – or it should be simple – script to acquire tweets from Twitter’s API (I have developer/app keys and am using the Tweepy interface, not scraping or anything of that sort – I may ditch Tweepy for something closer to the modern API but that is almost certainly not what’s causing this issue here).

I have a MySQL instance which I connect to and can query just fine, until it comes time to insert the tweet – which has a lot of special characters, almost inevitably. To be clear, I am using the official Python driver/connector for MySQL.

import mysql.connector
from mysql.connector import errorcode

Now, I’m aware StackOverflow is LITTERED with threads where people get my exact error – simply stating to check the MySQL syntax manual. These threads, which aren’t all that old (and I’m not using the latest Python, I use 3.7.9 for compatibility with some NLP libraries) insist the answer is to place the string that has the special characters into an old-style format string WITHIN the cursor.execute method, to enclose string variable placeholders in quotes, and to pass a tuple with an empty second value if, as in my case, only one variable is to be inserted. This is also a solution posted as part of a bug report response on the MySQL website – and yet, I have no success.

Here’s what I’ve got – following the directions on dozens of pages here and the official database website:

for tweet in tweepy.Cursor(twilek.search, q=keyword, tweet_mode='extended').items():
            twi_tweet = tweet.full_text
            print(twi_tweet)
            twi_tweet = twi_tweet.encode('utf8')
            requests_total+=1
            os.environ['TWITTER_REQUESTS'] = str(requests_total)
            requests_total = int(os.environ.get('TWITTER_REQUESTS'))
            # insert the archived tweet text into the database table
            sql = 'USE hate_tweets'
            ms_cur.execute(sql)
            twi_tweet = str(twi_tweet)
            insert_tweet = re.sub(r'[^A-Za-z0-9 ]+', '', twi_tweet)
            ms_cur.execute("INSERT INTO tweets_lgbt (text) VALUES %s" % (insert_tweet,))
            cnx.commit()
            print(ms_cur.rowcount, "record inserted.")

(twilek is my cursor object because I’m a dork)

expected result: string formatter passes MySQL a modified tweet string that it can process and add as a row to the tweets_lgbt table

actual result: insertion fails on a syntax error for any tweet

I’ve tried going so far as to use regex to strip everything but alphanumeric and spaces – same issue. I’m wondering if the new string format features of current Python versions have broken compatibility with this connector? I prefer to use the official driver but I’ll switch to an ORM if I must. (I did try the newer features like F strings, and found they caused the same result.)

2

Answers


  1. This is how you should insert a row to your table,

    insert_tweet = "ABCEFg 9 XYZ"
    "INSERT INTO tweets_lgbt (text) VALUES ('%s');"%(insert_tweet)
    
    "INSERT INTO tweets_lgbt (text) VALUES ('ABCEFg 9 XYZ');"
    

    Things to note

    1. The arguments to a string formatter is just like the arguments to a
      function. So, you cannot add a comma at the end to convert a string
      to a tuple there.

    2. If you are trying to insert multiple values at once, you can use cursor.executemany or this answer.

    Login or Signup to reply.
  2. I have these observations:

    • the VALUES clause requires parentheses VALUES (%s)
    • the quoting / escaping of values should be delegated to the cursor’s execute method, by using unquoted placeholders in the SQL and passing the values as the second argument: cursor.execute(sql, (tweet_text,)) or cursor.executemany(sql, [(tweet_text1,), (tweet_text2,)])
    • once these steps are applied there’s no need for encoding/stringifying/regex-ifying: assuming twi_text is a str and the database’s charset/collation supports the full UTF-8 range (for example utf8mb4) then the insert should succeed.
      • in particular, encoding a str and then calling str on the result is to be avoided: you end up with "b'my original string'"

    This modified version of the code in the question works for me:

    import mysql.connector
    
    DDL1 = """DROP TABLE IF EXISTS tweets_lgbt"""
    DDL2 = """
    CREATE TABLE tweets_lgbt (
        `text` VARCHAR (256))
    """
    
    # From https://twitter.com/AlisonMitchell/status/1332567013701500928?s=20
    insert_tweet = """
    Particularly pleased to see @SarahStylesAU
     quoted in this piece for the work she did
    👌
    
    Thrive like a girl: Why women's cricket in Australia is setting the standard
    """
    
    # Older connector releases don't support with... 
    with mysql.connector.connect(database='test') as cnx:
        with cnx.cursor() as ms_cur:
    
            ms_cur.execute(DDL1)
            ms_cur.execute(DDL2)
    
            ms_cur.execute("INSERT INTO tweets_lgbt (`text`) VALUES (%s)",  (insert_tweet,))
            cnx.commit()
            print(ms_cur.rowcount, "record inserted.")
    
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search