skip to Main Content

I have this query I’m running in MySQL:

SELECT
    count(*)
FROM
    library AS l
    JOIN plays AS p ON p.user_id = l.user_id AND
    l.path = p.path
WHERE
    l.user_id = 20977 AND
    p.time >= '2022-10-17';

When EXPLAIN ANALYZE is run:

| -> Aggregate: count(0)  (cost=1085653.55 rows=6692) (actual time=12576.265..12576.266 rows=1 loops=1)
    -> Nested loop inner join  (cost=1084984.37 rows=6692) (actual time=40.604..12566.569 rows=56757 loops=1)
        -> Index lookup on l using user_id_2 (user_id=20977)  (cost=116747.95 rows=106784) (actual time=13.153..3783.204 rows=59631 loops=1)
        -> Filter: ((p.user_id = 20977) and (p.`time` >= TIMESTAMP'2022-10-17 00:00:00'))  (cost=8.24 rows=0) (actual time=0.135..0.147 rows=1 loops=59631)
            -> Index lookup on p using path (path=l.`path`)  (cost=8.24 rows=8) (actual time=0.090..0.146 rows=1 loops=59631)
 |
1 row in set (12.76 sec)

I obviously want to make this faster!

Table definitions

CREATE TABLE `library` (
  `user_id` int NOT NULL,
  `name` varchar(20) COLLATE utf8mb4_general_ci NOT NULL,
  `path` varchar(512) COLLATE utf8mb4_general_ci NOT NULL,
  `title` varchar(512) COLLATE utf8mb4_general_ci NOT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `edited` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `db_id` int NOT NULL,
  `tag` varchar(64) COLLATE utf8mb4_general_ci NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

CREATE TABLE `plays` (
  `user_id` int DEFAULT NULL,
  `name` varchar(20) CHARACTER SET utf8 DEFAULT NULL,
  `path` varchar(512) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `time` datetime DEFAULT CURRENT_TIMESTAMP,
  `play_id` int NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

ALTER TABLE `library`
  ADD PRIMARY KEY (`db_id`),
  ADD KEY `user_id_loc` (`user_id`,`name`,`path`(191)),
  ADD KEY `edited` (`edited`),
  ADD KEY `created` (`created`),
  ADD KEY `title` (`title`),
  ADD KEY `user_id` (`user_id`),
  ADD INDEX `user_id_by_title` (`user_id`, `title`);

ALTER TABLE `plays`
  ADD PRIMARY KEY (`play_id`),
  ADD KEY `user_id` (`user_id`,`name`,`path`(255)),
  ADD KEY `user_id_2` (`user_id`,`name`),
  ADD KEY `time` (`time`),
  ADD KEY `path` (`path`),
  ADD KEY `user_id_3` (`user_id`,`name`,`path`,`time`);

It looks like the killer is the looping over 59631 rows.

Would an index on (user_id, time) make it faster?

Interestingly, the user_id_2 index is actually an index on (user_id, title), rather than the plain user_id index. I’m not sure why user_id_2 is chosen given title isn’t used in the query.

2

Answers


  1. I tested your query and tried a different index in each table.

    ALTER TABLE library ADD KEY bk1 (user_id, path); 
    
    ALTER TABLE plays ADD KEY bk2 (user_id, path, time); 
    
    EXPLAIN SELECT
        COUNT(*)
    FROM
        library AS l USE INDEX (bk1)
        JOIN plays AS p USE INDEX (bk2)
          ON p.user_id = l.user_id 
          AND l.path = p.path
    WHERE
        l.user_id = 20977 
        AND p.time >= '2022-10-17';
    
    +----+-------------+-------+------------+------+---------------+------+---------+-------------------+------+----------+--------------------------+
    | id | select_type | table | partitions | type | possible_keys | key  | key_len | ref               | rows | filtered | Extra                    |
    +----+-------------+-------+------------+------+---------------+------+---------+-------------------+------+----------+--------------------------+
    |  1 | SIMPLE      | l     | NULL       | ref  | bk1           | bk1  | 4       | const             |    1 |   100.00 | Using index              |
    |  1 | SIMPLE      | p     | NULL       | ref  | bk2           | bk2  | 2056    | const,test.l.path |    1 |   100.00 | Using where; Using index |
    +----+-------------+-------+------------+------+---------------+------+---------+-------------------+------+----------+--------------------------+
    

    The note "Using index" in each row of the EXPLAIN report shows that it’s getting the benefit of a covering index for both tables.

    I didn’t use prefix index syntax, because that would spoil the covering index optimization. It’s not necessary to use prefix indexes for this example on modern MySQL versions because they default to an InnoDB row format that supports 3072-byte indexes instead of the old MySQL that only supported 768-byte indexes by default.

    In my test, I had zero rows in the tables I tested, so I had to use an index hint to make the optimizer choose my new indexes. In a table with a substantial number of rows, the optimizer might choose the new indexes on its own.

    Login or Signup to reply.
  2. DROP these, they in the way and/or redundant:

    l: `user_id` (`user_id`),
    p: `user_id` (`user_id`,`name`,`path`(255)),
    p: `user_id_2` (`user_id`,`name`),
    

    Add these:

    l:  INDEX(user_id,  path)
    p:  INDEX(user_id,  path, time)
    p:  INDEX(user_id,  time, path)   -- see below
    

    Change (MySQL 5.7/8.0 no longer needs the prefix kludge):

    l:  `user_id_loc` (`user_id`,`name`,`path`)  -- tossing 191
    

    Try to avoid testing columns from different table in WHERE.

    First I saw

        WHERE  l.user_id = 20977
          AND  p.time >= '2022-10-17';
    

    and assumed that was the crux of the problem. But then I saw that you did not have INDEX(user_id, time) on p and that the tables are joined [partially] on user_id.

    Suggest (to avoid my confusion) that you make this change:

        WHERE  l.user_id = 20977   -- >
        WHERE  p.user_id = 20977
    

    The Optimizer should be smart enough to realize that, then use

    p:  INDEX(user_id,  time, path)   -- as mentioned above
    

    But, once you have done that, the query collapses to

    SELECT COUNT(DISTINCT user_id, path)
        FROM plays
        WHERE  user_id = 20977
          AND  time >= '2022-10-17';
    

    I think that it will say "Covering index skip scan for deduplication" to indicate that it is not actually scanning all 60K rows in plays, but hopping through the index!

    However, if there are "plays" that do not have a corresponding entry in "library", then the count will be high by the number of missing user-play combos.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search