I’m writing a simple – or it should be simple – script to acquire tweets from Twitter’s API (I have developer/app keys and am using the Tweepy interface, not scraping or anything of that sort – I may ditch Tweepy for something closer to the modern API but that is almost certainly not what’s causing this issue here).
I have a MySQL instance which I connect to and can query just fine, until it comes time to insert the tweet – which has a lot of special characters, almost inevitably. To be clear, I am using the official Python driver/connector for MySQL.
import mysql.connector
from mysql.connector import errorcode
Now, I’m aware StackOverflow is LITTERED with threads where people get my exact error – simply stating to check the MySQL syntax manual. These threads, which aren’t all that old (and I’m not using the latest Python, I use 3.7.9 for compatibility with some NLP libraries) insist the answer is to place the string that has the special characters into an old-style format string WITHIN the cursor.execute method, to enclose string variable placeholders in quotes, and to pass a tuple with an empty second value if, as in my case, only one variable is to be inserted. This is also a solution posted as part of a bug report response on the MySQL website – and yet, I have no success.
Here’s what I’ve got – following the directions on dozens of pages here and the official database website:
for tweet in tweepy.Cursor(twilek.search, q=keyword, tweet_mode='extended').items():
twi_tweet = tweet.full_text
print(twi_tweet)
twi_tweet = twi_tweet.encode('utf8')
requests_total+=1
os.environ['TWITTER_REQUESTS'] = str(requests_total)
requests_total = int(os.environ.get('TWITTER_REQUESTS'))
# insert the archived tweet text into the database table
sql = 'USE hate_tweets'
ms_cur.execute(sql)
twi_tweet = str(twi_tweet)
insert_tweet = re.sub(r'[^A-Za-z0-9 ]+', '', twi_tweet)
ms_cur.execute("INSERT INTO tweets_lgbt (text) VALUES %s" % (insert_tweet,))
cnx.commit()
print(ms_cur.rowcount, "record inserted.")
(twilek is my cursor object because I’m a dork)
expected result: string formatter passes MySQL a modified tweet string that it can process and add as a row to the tweets_lgbt table
actual result: insertion fails on a syntax error for any tweet
I’ve tried going so far as to use regex to strip everything but alphanumeric and spaces – same issue. I’m wondering if the new string format features of current Python versions have broken compatibility with this connector? I prefer to use the official driver but I’ll switch to an ORM if I must. (I did try the newer features like F strings, and found they caused the same result.)
2
Answers
This is how you should insert a row to your table,
Things to note
The arguments to a string formatter is just like the arguments to a
function. So, you cannot add a comma at the end to convert a string
to a tuple there.
If you are trying to insert multiple values at once, you can use cursor.executemany or this answer.
I have these observations:
VALUES
clause requires parenthesesVALUES (%s)
execute
method, by using unquoted placeholders in the SQL and passing the values as the second argument:cursor.execute(sql, (tweet_text,))
orcursor.executemany(sql, [(tweet_text1,), (tweet_text2,)])
twi_text
is astr
and the database’s charset/collation supports the full UTF-8 range (for example utf8mb4) then the insert should succeed.str
and then callingstr
on the result is to be avoided: you end up with"b'my original string'"
This modified version of the code in the question works for me: