skip to Main Content

Can someone please explain why this mysql query takes forever to execute? Its like it’s stuck in a loop with no results returned.

Screenshot

SELECT m.title
FROM movies m
JOIN movie_starredin c ON m.ratingkey = c.movieid
WHERE c.actorid IN ( 
    SELECT id 
    FROM movie_actors 
    WHERE name = 'Harrison Ford'
);

2

Answers


  1. You could "Debug" your query execution time by seperate it into smaller query:

    1 . Run this and check exec time:

    SELECT id 
    FROM movie_actors 
    WHERE name = 'Harrison Ford'
    

    2 . Run this and check exec time:

    SELECT *
    FROM movie_starredin
    WHERE actorid IN ( 
        SELECT id 
        FROM movie_actors 
        WHERE name = 'Harrison Ford'
    );
    

    3 . Run this and check exec time, with xxxxx is a value of movie_starredin.actorid:

    SELECT m.title
    FROM movies m
    JOIN movie_starredin c ON m.ratingkey = c.movieid
    WHERE c.actorid = xxxxx; 
    

    Each step check if exec time too long/ slow. Then use index to reduce exec time of sql select (search google for this). But consider the tradeoff that is design of the table you want to add index because index could slow down insert, update action to table with large number (millions) of records.

    Login or Signup to reply.
  2. please explain why this mysql query takes forever to execute

    We cannot answer this directly, as you have not included the DDL (SHOW CREATE TABLE tbl_name;) for your tables or the EXPLAIN output for your example query, but I think it is reasonable to assume that it is likely due to a lack of appropriate indices.

    EXPLAIN is often the first port of call when wanting to understand the performance of a query. 8.8.2 EXPLAIN Output Format has lots of details on the output.

    Test schema

    This is my test schema with data imported from IMDb:

    CREATE DATABASE `imdb` DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
    USE `imdb`;
    
    CREATE TABLE `title_basics` (
      `tconst` int unsigned NOT NULL PRIMARY KEY,
      `titleType` enum('short','movie','tvSeries','tvShort','tvMovie','tvEpisode','tvMiniSeries','tvSpecial','video','videoGame','tvPilot'),
      `primaryTitle` varchar(500) NOT NULL,
      `originalTitle` varchar(500) NOT NULL,
      `isAdult` tinyint(1) NOT NULL DEFAULT 0,
      `startYear` smallint unsigned,
      `endYear` smallint unsigned,
      `runtimeMinutes` smallint unsigned,
      `genres` varchar(255)
    );
    -- Rows: 9,438,577  Data_length: 0.88GB
    
    CREATE TABLE `name_basics` (
      `nconst` int unsigned NOT NULL PRIMARY KEY,
      `primaryName` varchar(120) NOT NULL,
      `birthYear` smallint unsigned,
      `deathYear` smallint unsigned,
      `primaryProfession` varchar(80),
      `knownForTitles` varchar(80)
    );
    -- Rows: 13,042,955 Data_length: 0.89GB
    
    CREATE TABLE `title_principals` (
      `tconst` int unsigned NOT NULL,
      `ordering` smallint unsigned,
      `nconst` int unsigned NOT NULL,
      `category` enum('self','director','cinematographer','composer','producer','editor','actor','actress','writer','production_designer','archive_footage','archive_sound'),
      `job` varchar(1024),
      `characters` varchar(1024)
    );
    -- Rows: 57,378,319 Data_length: 3.06GB
    

    For these tests, the entire dataset was preloaded into the buffer pool.

    Base query

    This is my version of your query, given my slightly different schema (returns 603 rows):

    SELECT m.primaryTitle
    FROM title_basics m
    JOIN title_principals c ON m.tconst = c.tconst
    WHERE c.nconst IN (
        SELECT nconst
        FROM name_basics
        WHERE primaryName = 'Harrison Ford'
    );
    

    Executing this query without any additional indices, I get the following execution plan:

    id select_type table type possible_keys key key_len ref rows filtered Extra
    1 SIMPLE name_basics ALL PRIMARY 13119941 0.00 Using where
    1 SIMPLE c ALL 57598197 0.00 Using where; Using join buffer (hash join)
    1 SIMPLE m eq_ref PRIMARY PRIMARY 4 imdb.c.tconst 1 100.00

    Timing: 29.750 sec / 6.141 sec

    Rows accessed: 70,421,877 (quite a lot)

    This is bad news, as we have two full table scans (ALL in the type column) and they are of large tables (~13M & ~57M rows).

    Adding an index on primaryName

    ALTER TABLE `imdb`.`name_basics`
        ADD INDEX `idx_name` (`primaryName`);
    
    id select_type table type possible_keys key key_len ref rows filtered Extra
    1 SIMPLE name_basics ref PRIMARY,idx_name idx_name 482 const 4 100.00 Using index
    1 SIMPLE c ALL 57598197 0.00 Using where; Using join buffer (hash join)
    1 SIMPLE m eq_ref PRIMARY PRIMARY 4 imdb.c.tconst 1 100.00

    Timing: 25.438 sec / 6.328 sec

    Rows accessed: 57,378,926 (still a lot)

    Here we can see a dramatic improvement in the access of name_basics, instead of the full table scan seen in the first query, we are now using the index to access an estimated 4 rows (instead of ~13M), and it is not having to access the clustered index (the actual table data), as everything that is needed is in the secondary index (Using index in Extra column). Unfortunately, this is only a small improvement for the overall execution time as we are still having to scan the ~57M rows in title_principals.

    Adding indices for FKs

    A common approach is just to index the foreign keys, so let’s add single column indices for the FKs on title_principals:

    ALTER TABLE `imdb`.`title_principals`
        ADD INDEX `idx_tconst` (`tconst`),
        ADD INDEX `idx_nconst` (`nconst`);
    
    id select_type table type possible_keys key key_len ref rows filtered Extra
    1 SIMPLE name_basics ref PRIMARY,idx_name idx_name 482 const 4 100.00 Using index
    1 SIMPLE c ref idx_tconst,idx_nconst idx_nconst 4 imdb.name_basics.nconst 13 100.00
    1 SIMPLE m eq_ref PRIMARY PRIMARY 4 imdb.c.tconst 1 100.00

    Timing: 0.016 sec / 0.000 sec

    Rows accessed: 1,267 (that’s better)

    Here we can see we have got rid of the full table scan for title_principals and instead of having to scan ~57M rows, it is now estimating 13 rows.

    Using composite indices for the junction table

    We can take this a step further by adding composite indices instead:

    ALTER TABLE `imdb`.`title_principals`
        DROP INDEX `idx_tconst` (`tconst`),
        DROP INDEX `idx_nconst` (`nconst`),
        ADD INDEX `idx_tconst_nconst` (`tconst`, `nconst`),
        ADD INDEX `idx_nconst_tconst` (`nconst`, `tconst`);
    
    id select_type table type possible_keys key key_len ref rows filtered Extra
    1 SIMPLE name_basics ref PRIMARY,idx_name idx_name 482 const 4 100.00 Using index
    1 SIMPLE c ref idx_tconst_nconst,idx_nconst_tconst idx_nconst_tconst 4 imdb.name_basics.nconst 10 100.00 Using index
    1 SIMPLE m eq_ref PRIMARY PRIMARY 4 imdb.c.tconst 1 100.00

    Timing: 0.016 sec / 0.000 sec

    Rows accessed: 1,267

    The benefit here is less obvious, but it is still there. If you look at the Extra column, we now have Using index meaning –

    The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row.

    When checking the server-side execution time, averaged over 10 runs each, I got 0.007s (single column) and 0.003s (composite).

    This has ended up being much longer than I intended, but I hope it proves useful to someone.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search