We have a table having around 10 million records and we are trying to update some columns using the id(primary key) in the where clause.
UPDATE table_name SET column1=1, column2=0,column3='2022-10-30' WHERE id IN(1,2,3,4,5,6,7,......etc);
Scenario 1: when there are 3000 or fewer ids in the IN clause and if I try for EXPLAIN, then the ‘possible_keys’ and ‘key’ show the PRIMARY, and the query gets executed very fast.
Scenario 2: when there are 3000 or more ids(up to 30K) in the IN clause and if I try for EXPLAIN, then the ‘possible_keys’ shows NULL and the ‘key’ shows the PRIMARY and the query runs forever. If I use FORCE INDEX(PRIMARY) then the ‘possible_keys’ and the ‘key’ shows the PRIMARY and the query gets executed very fast.
Scenario 3: when there are more than 30k ids in the IN clause and even if I use FORCE INDEX(PRIMARY), the ‘possible_keys’ shows NULL, and the ‘key’ shows the PRIMARY and the query runs forever.
I believe the optimizer is going for a full table scan instead of an index scan. Can we make any change such that the optimizer goes for an index scan instead of a table scan? Please suggest if there are any parameter changes required to overcome this issue.
The MySQL version is 5.7
2
Answers
As far as I know you need to just provide an ad-hoc table with all the ids and join table_name from it:
In mysql 8 you can use a values table constructor which is a little more terse (omit "row" for mariadb, e.g.
values (1),(2),(3)
):fiddle
When
UPDATEing
a significant chunk of a table wit all the same update values, I see a red flag.Do you always update the same set of rows? Could that info be in a smaller separate table that you JOIN to?
Or may some other structural schema change that focuses on helping the Updates be faster?
If you must have a long IN list, I suggest doing 100 at a time. And don’t try to
COMMIT
all 3000+ in the same transaction. (Committing in chunks mak violate some business logic, so you may not want to do such.)