Optimize MySQL query using range and group by on large table

yasir
January 19, 2023
167 views
0 votes
2 Answers

My table structure:

CREATE TABLE `jobs_view_stats` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `job_id` int(11) NOT NULL,
  `created_at` datetime NOT NULL,
  `account_id` int(11) DEFAULT NULL,
  `country` varchar(2) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_D05BC6799FDS15210` (`job_id`),
  KEY `FK_YTGBC67994591257` (`account_id`),
  KEY `jobs_view_stats_created_at_id_index` (`created_at`,`id`),
  CONSTRAINT `FK_YTGBC67994591257` FOREIGN KEY (`account_id`) REFERENCES `accounts` (`id`) ON DELETE SET NULL,
  CONSTRAINT `job_views_jobs_id_fk` FOREIGN KEY (`job_id`) REFERENCES `jobs` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=79976587 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='New jobs views system'

This is the query:

SELECT COUNT(id) as view, job_id
from jobs_view_stats
WHERE jobs_view_stats.created_at between '2022-11-01 00:00:00' AND '2022-11-30 23:59:59'
GROUP BY jobs_view_stats.job_id

Execution plan:

id	select_type	table	partitions	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	jobs_view_stats	null	range	IDX_D05BC6799FDS15210,jobs_view_stats_created_at_id_index	jobs_view_stats_created_at_id_index	5	null	1584610	100	Using index condition; Using MRR; Using temporary; Using filesort

This query takes 4m to complete I want to reduce it to take minimum time.

Tags: mysql

Answers

- Clint
- January 19, 2023 at 4:32 pm
- 0 votes
0
In your execution plan, you are returning 1584610 rows that are then grouped by, which in turn uses a temp table to sort through and group (slow).

jobs_view_stats_created_at_id_index contains also ‘id’ which will make the key cardinality excessively large, could try instead adding job_id to the key as that is what you are grouping by.

I think the main issue is your where clause returns over 1.5 million rows that have to all load into a temp table to then be re-read in full to be grouped by.

You need to bite off smaller chunks i think.

I’m going to assume you are using a programming lang to envoke the DB calls (like PHP), if so you could try
```
SELECT DISTINCT job_id
from jobs_view_stats
WHERE jobs_view_stats.created_at between '2022-11-01 00:00:00' AND '2022-11-30 23:59:59'
```
then when you have the job_id list

do smaller queries looping over the results from the first
```
SELECT count(*) FROM jobs_view_stats where job_id = *theid*
```
or if there are many different job_id’s batch them
```
SELECT count(*) FROM jobs_view_stats where job_id IN('id1', id2, id3....)
```
for a pure MySQL resolution, I would create a temp memory table of all the job_ids as a memory table using
```
CREATE TEMPORARY TABLE `temptable`
(
 job_id INT PRIMARY KEY
) ENGINE=MEMORY   

INSERT INTO 'temptable' SELECT DISTINCT job_id
 from jobs_view_stats
 WHERE jobs_view_stats.created_at between '2022-11-01 00:00:00' AND '2022-11-30 23:59:59'
```
Then
```
SELECT count(*) as view, job_id FROM jobs_view_stats where job_id = (SELECT job_id FROM `temptable`)
```
this is all totally untested so might be typos.
Login or Signup to reply.

- RickJames
- January 23, 2023 at 8:00 am
- 0 votes
0
Change COUNT(id) to COUNT(*) since it does not need to check id for being NOT NULL

Add
```
INDEX(job_id, created_at)
INDEX(created_at, job_id)
```
Either of those will be "covering". (I don’t know which is better; the Optimizer will know.)

Drop the index (job_id), as being redundant.
Unless you specifically need (created_id, id), drop it.

For even more performance, build and maintain a Summary Table . Then perform the query against it. Such a summary table might have PRIMARY KEY(job_id, created_date) and daily counts for each day for each job.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.