Can someone explain why this mysql query takes forever to execute? Its like it's stuck in a loop with no results returned

DavidWright
November 29, 2023
204 views
0 votes
2 Answers

Can someone please explain why this mysql query takes forever to execute? Its like it’s stuck in a loop with no results returned.

Screenshot

SELECT m.title
FROM movies m
JOIN movie_starredin c ON m.ratingkey = c.movieid
WHERE c.actorid IN ( 
    SELECT id 
    FROM movie_actors 
    WHERE name = 'Harrison Ford'
);

Tags: mysql sql

Answers

- PhamXBach
- November 28, 2023 at 5:26 am
- 0 votes
0
You could "Debug" your query execution time by seperate it into smaller query:

1 . Run this and check exec time:
```
SELECT id 
FROM movie_actors 
WHERE name = 'Harrison Ford'
```
2 . Run this and check exec time:
```
SELECT *
FROM movie_starredin
WHERE actorid IN ( 
    SELECT id 
    FROM movie_actors 
    WHERE name = 'Harrison Ford'
);
```
3 . Run this and check exec time, with xxxxx is a value of movie_starredin.actorid:
```
SELECT m.title
FROM movies m
JOIN movie_starredin c ON m.ratingkey = c.movieid
WHERE c.actorid = xxxxx; 
```
Each step check if exec time too long/ slow. Then use index to reduce exec time of sql select (search google for this). But consider the tradeoff that is design of the table you want to add index because index could slow down insert, update action to table with large number (millions) of records.
Login or Signup to reply.

please explain why this mysql query takes forever to execute

We cannot answer this directly, as you have not included the DDL (SHOW CREATE TABLE tbl_name;) for your tables or the EXPLAIN output for your example query, but I think it is reasonable to assume that it is likely due to a lack of appropriate indices.

EXPLAIN is often the first port of call when wanting to understand the performance of a query. 8.8.2 EXPLAIN Output Format has lots of details on the output.

Test schema

This is my test schema with data imported from IMDb:

CREATE DATABASE `imdb` DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
USE `imdb`;

CREATE TABLE `title_basics` (
  `tconst` int unsigned NOT NULL PRIMARY KEY,
  `titleType` enum('short','movie','tvSeries','tvShort','tvMovie','tvEpisode','tvMiniSeries','tvSpecial','video','videoGame','tvPilot'),
  `primaryTitle` varchar(500) NOT NULL,
  `originalTitle` varchar(500) NOT NULL,
  `isAdult` tinyint(1) NOT NULL DEFAULT 0,
  `startYear` smallint unsigned,
  `endYear` smallint unsigned,
  `runtimeMinutes` smallint unsigned,
  `genres` varchar(255)
);
-- Rows: 9,438,577  Data_length: 0.88GB

CREATE TABLE `name_basics` (
  `nconst` int unsigned NOT NULL PRIMARY KEY,
  `primaryName` varchar(120) NOT NULL,
  `birthYear` smallint unsigned,
  `deathYear` smallint unsigned,
  `primaryProfession` varchar(80),
  `knownForTitles` varchar(80)
);
-- Rows: 13,042,955 Data_length: 0.89GB

CREATE TABLE `title_principals` (
  `tconst` int unsigned NOT NULL,
  `ordering` smallint unsigned,
  `nconst` int unsigned NOT NULL,
  `category` enum('self','director','cinematographer','composer','producer','editor','actor','actress','writer','production_designer','archive_footage','archive_sound'),
  `job` varchar(1024),
  `characters` varchar(1024)
);
-- Rows: 57,378,319 Data_length: 3.06GB

For these tests, the entire dataset was preloaded into the buffer pool.

Base query

This is my version of your query, given my slightly different schema (returns 603 rows):

SELECT m.primaryTitle
FROM title_basics m
JOIN title_principals c ON m.tconst = c.tconst
WHERE c.nconst IN (
    SELECT nconst
    FROM name_basics
    WHERE primaryName = 'Harrison Ford'
);

Executing this query without any additional indices, I get the following execution plan:

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	name_basics	ALL	PRIMARY				13119941	0.00	Using where
1	SIMPLE	c	ALL					57598197	0.00	Using where; Using join buffer (hash join)
1	SIMPLE	m	eq_ref	PRIMARY	PRIMARY	4	imdb.c.tconst	1	100.00

Timing: 29.750 sec / 6.141 sec

Rows accessed: 70,421,877 (quite a lot)

This is bad news, as we have two full table scans (ALL in the type column) and they are of large tables (~13M & ~57M rows).

Adding an index on `primaryName`

ALTER TABLE `imdb`.`name_basics`
    ADD INDEX `idx_name` (`primaryName`);

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	name_basics	ref	PRIMARY,idx_name	idx_name	482	const	4	100.00	Using index
1	SIMPLE	c	ALL					57598197	0.00	Using where; Using join buffer (hash join)
1	SIMPLE	m	eq_ref	PRIMARY	PRIMARY	4	imdb.c.tconst	1	100.00

Timing: 25.438 sec / 6.328 sec

Rows accessed: 57,378,926 (still a lot)

Here we can see a dramatic improvement in the access of name_basics, instead of the full table scan seen in the first query, we are now using the index to access an estimated 4 rows (instead of ~13M), and it is not having to access the clustered index (the actual table data), as everything that is needed is in the secondary index (Using index in Extra column). Unfortunately, this is only a small improvement for the overall execution time as we are still having to scan the ~57M rows in title_principals.

Adding indices for FKs

A common approach is just to index the foreign keys, so let’s add single column indices for the FKs on title_principals:

ALTER TABLE `imdb`.`title_principals`
    ADD INDEX `idx_tconst` (`tconst`),
    ADD INDEX `idx_nconst` (`nconst`);

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	name_basics	ref	PRIMARY,idx_name	idx_name	482	const	4	100.00	Using index
1	SIMPLE	c	ref	idx_tconst,idx_nconst	idx_nconst	4	imdb.name_basics.nconst	13	100.00
1	SIMPLE	m	eq_ref	PRIMARY	PRIMARY	4	imdb.c.tconst	1	100.00

Timing: 0.016 sec / 0.000 sec

Rows accessed: 1,267 (that’s better)

Here we can see we have got rid of the full table scan for title_principals and instead of having to scan ~57M rows, it is now estimating 13 rows.

Using composite indices for the junction table

We can take this a step further by adding composite indices instead:

ALTER TABLE `imdb`.`title_principals`
    DROP INDEX `idx_tconst` (`tconst`),
    DROP INDEX `idx_nconst` (`nconst`),
    ADD INDEX `idx_tconst_nconst` (`tconst`, `nconst`),
    ADD INDEX `idx_nconst_tconst` (`nconst`, `tconst`);

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	name_basics	ref	PRIMARY,idx_name	idx_name	482	const	4	100.00	Using index
1	SIMPLE	c	ref	idx_tconst_nconst,idx_nconst_tconst	idx_nconst_tconst	4	imdb.name_basics.nconst	10	100.00	Using index
1	SIMPLE	m	eq_ref	PRIMARY	PRIMARY	4	imdb.c.tconst	1	100.00

Timing: 0.016 sec / 0.000 sec

Rows accessed: 1,267

The benefit here is less obvious, but it is still there. If you look at the Extra column, we now have Using index meaning –

The column information is retrieved from the table using only information in the index tree without having to do an additional seek to read the actual row.

When checking the server-side execution time, averaged over 10 runs each, I got 0.007s (single column) and 0.003s (composite).

This has ended up being much longer than I intended, but I hope it proves useful to someone.

Please signup or login to give your own answer.

Click here to cancel reply.