In the example code below, col1 and col2 are primary keys in the database!
My question is: should they be added in the part of the code after the ON DUPLICATE KEY UPDATE, as it is already in the code, or should they not be added?
Example code:
with Dl.cursor() as cursor:
for chunk in np.array_split(DataFrame, 10, axis=0):
for i in chunk.index:
cursor.execute("INSERT INTO table_example (col1, col2, col3, col4) VALUES (%s, %s, %s, %s) ON DUPLICATE KEY UPDATE col1 = col1, col2 = col2, col3 = col3, col4 = col4;", (chunk['col1'][i], chunk['col2'][i], chunk['col3'][i], chunk['col4'][i]))
# col3 = col3, col4 = col4; ... Which version is correct?
Dl.commit()
cursor.close()
Dl.close()
2
Answers
It is not necessary to update the unique key or primary so your SQL can be like this:
The SQL already knows that it is in the context of col1 and col2 as the duplicate key.
So for your code it should be something like:
If you have no other unique keys that could cause the ON DUPLICATE to be executed, col1 and col2 won’t change and you should leave them out.
If you do have other unique keys, you probably don’t want to change col1 and col2 anyway.