When I perform EXPLAIN on the following query to fetch users with bookings
SELECT user.id
FROM user
LEFT JOIN booking
ON booking.user_id = user.id
AND booking.end_timestamp > 1706878800
AND booking.end_timestamp <= 1706882400
AND booking.status IN ('pending', 'progress', 'done');
I get the following result:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | user | NULL | index | NULL | status | 1 | NULL | 17644 | 100 | Using index |
1 | SIMPLE | booking | NULL | ref | status,user_id_start_timestamp,user_id_status_end_timestamp,end_timestamp | user_id_status_end_timestamp | 5 | db.user.id | 403 | 0.01 | Using where; Using index |
As you can see, it automatically uses the index user_id_status_end_timestamp
for the booking
table.
But when I add another join to the query to include address
which depends on the booking
table:
SELECT user.id
FROM user
LEFT JOIN booking
ON booking.user_id = user.id
AND booking.end_timestamp > 1706878800
AND booking.end_timestamp <= 1706882400
AND booking.status IN ('pending', 'progress', 'done')
LEFT JOIN address ON booking.address_id = address.id;
now it no longer uses any index for the booking
table making the query very slow even though it shows the same list of possible keys:
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | SIMPLE | user | NULL | index | NULL | status | 1 | NULL | 17644 | 100 | Using index |
1 | SIMPLE | booking | NULL | ALL | status,user_id_start_timestamp,user_id_status_end_timestamp,end_timestamp | NULL | NULL | NULL | 9509525 | 100 | Range checked for each record (index map: 0x98040) |
1 | SIMPLE | address | NULL | eq_ref | PRIMARY | PRIMARY | 4 | db.booking.address_id | 1 | 100 | Using index |
Of course, if I add USE INDEX (user_id_status_end_timestamp)
, it works, but why is MySQL not using the index automatically in this case?
I am using MySQL version 5.7.42.
EDIT
Please note that I need both the joins to be left join. The original query is much bigger. I have trimmed it down to focus only on the part where the problem lies for easier understanding and debugging. To explain why they both need to be left joins, here is what one of the where clauses look like:
(booking.id IS NULL OR address.country_id = :countryId)
2
Answers
Remove the
LEFTs
, they seem to be "wrong" for your application.bookings
needsSince you are filtering on dates and status, that should be the first table to work with, but
LEFT
prevents that. Instead, you get all users, together will NULLs from the other tables.The Optimizer will pick between those indexes based on table statistics and the values provided.
For your first query:
the index
user_id_status_end_timestamp (user_id, status, end_timestamp)
is covering, meaning there is no additional read from the clustered index required. When you add the join to theaddress
table, thebooking.address_id
is now required but not available in the index. This now requires an additional read from the clustered index (table data), so MySQL decides the index is not suitable for the whole join (it may still be used for per row range scans).If you add
address_id
to the index, it becomes covering again and should be chosen by the optimizer. Switching the order ofstatus
andend_timestamp
may also improve performance.