Can someone please explain why this mysql query takes forever to execute? Its like it’s stuck in a loop with no results returned.
SELECT m.title
FROM movies m
JOIN movie_starredin c ON m.ratingkey = c.movieid
WHERE c.actorid IN (
SELECT id
FROM movie_actors
WHERE name = 'Harrison Ford'
);
2
Answers
You could "Debug" your query execution time by seperate it into smaller query:
1 . Run this and check exec time:
2 . Run this and check exec time:
3 . Run this and check exec time, with xxxxx is a value of
movie_starredin.actorid
:Each step check if exec time too long/ slow. Then use index to reduce exec time of sql select (search google for this). But consider the tradeoff that is design of the table you want to add index because index could slow down insert, update action to table with large number (millions) of records.
We cannot answer this directly, as you have not included the DDL (
SHOW CREATE TABLE tbl_name;
) for your tables or theEXPLAIN
output for your example query, but I think it is reasonable to assume that it is likely due to a lack of appropriate indices.EXPLAIN
is often the first port of call when wanting to understand the performance of a query. 8.8.2 EXPLAIN Output Format has lots of details on the output.Test schema
This is my test schema with data imported from IMDb:
For these tests, the entire dataset was preloaded into the buffer pool.
Base query
This is my version of your query, given my slightly different schema (returns 603 rows):
Executing this query without any additional indices, I get the following execution plan:
Timing: 29.750 sec / 6.141 sec
Rows accessed: 70,421,877 (quite a lot)
This is bad news, as we have two full table scans (
ALL
in the type column) and they are of large tables (~13M & ~57M rows).Adding an index on
primaryName
Timing: 25.438 sec / 6.328 sec
Rows accessed: 57,378,926 (still a lot)
Here we can see a dramatic improvement in the access of
name_basics
, instead of the full table scan seen in the first query, we are now using the index to access an estimated 4 rows (instead of ~13M), and it is not having to access the clustered index (the actual table data), as everything that is needed is in the secondary index (Using index
in Extra column). Unfortunately, this is only a small improvement for the overall execution time as we are still having to scan the ~57M rows intitle_principals
.Adding indices for FKs
A common approach is just to index the foreign keys, so let’s add single column indices for the FKs on
title_principals
:Timing: 0.016 sec / 0.000 sec
Rows accessed: 1,267 (that’s better)
Here we can see we have got rid of the full table scan for
title_principals
and instead of having to scan ~57M rows, it is now estimating 13 rows.Using composite indices for the junction table
We can take this a step further by adding composite indices instead:
Timing: 0.016 sec / 0.000 sec
Rows accessed: 1,267
The benefit here is less obvious, but it is still there. If you look at the
Extra
column, we now haveUsing index
meaning –When checking the server-side execution time, averaged over 10 runs each, I got 0.007s (single column) and 0.003s (composite).
This has ended up being much longer than I intended, but I hope it proves useful to someone.