skip to Main Content

My table structure:

CREATE TABLE `jobs_view_stats` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `job_id` int(11) NOT NULL,
  `created_at` datetime NOT NULL,
  `account_id` int(11) DEFAULT NULL,
  `country` varchar(2) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_D05BC6799FDS15210` (`job_id`),
  KEY `FK_YTGBC67994591257` (`account_id`),
  KEY `jobs_view_stats_created_at_id_index` (`created_at`,`id`),
  CONSTRAINT `FK_YTGBC67994591257` FOREIGN KEY (`account_id`) REFERENCES `accounts` (`id`) ON DELETE SET NULL,
  CONSTRAINT `job_views_jobs_id_fk` FOREIGN KEY (`job_id`) REFERENCES `jobs` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=79976587 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='New jobs views system'

This is the query:

SELECT COUNT(id) as view, job_id
from jobs_view_stats
WHERE jobs_view_stats.created_at between '2022-11-01 00:00:00' AND '2022-11-30 23:59:59'
GROUP BY jobs_view_stats.job_id

Execution plan:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE jobs_view_stats null range IDX_D05BC6799FDS15210,jobs_view_stats_created_at_id_index jobs_view_stats_created_at_id_index 5 null 1584610 100 Using index condition; Using MRR; Using temporary; Using filesort

This query takes 4m to complete I want to reduce it to take minimum time.

2

Answers


  1. In your execution plan, you are returning 1584610 rows that are then grouped by, which in turn uses a temp table to sort through and group (slow).

    jobs_view_stats_created_at_id_index contains also ‘id’ which will make the key cardinality excessively large, could try instead adding job_id to the key as that is what you are grouping by.

    I think the main issue is your where clause returns over 1.5 million rows that have to all load into a temp table to then be re-read in full to be grouped by.

    You need to bite off smaller chunks i think.

    I’m going to assume you are using a programming lang to envoke the DB calls (like PHP), if so you could try

    SELECT DISTINCT job_id
    from jobs_view_stats
    WHERE jobs_view_stats.created_at between '2022-11-01 00:00:00' AND '2022-11-30 23:59:59'
    

    then when you have the job_id list

    do smaller queries looping over the results from the first

    SELECT count(*) FROM jobs_view_stats where job_id = *theid*
    

    or if there are many different job_id’s batch them

    SELECT count(*) FROM jobs_view_stats where job_id IN('id1', id2, id3....)
    

    for a pure MySQL resolution, I would create a temp memory table of all the job_ids as a memory table using

    CREATE TEMPORARY TABLE `temptable`
    (
     job_id INT PRIMARY KEY
    ) ENGINE=MEMORY   
    
    INSERT INTO 'temptable' SELECT DISTINCT job_id
     from jobs_view_stats
     WHERE jobs_view_stats.created_at between '2022-11-01 00:00:00' AND '2022-11-30 23:59:59'
    

    Then

    SELECT count(*) as view, job_id FROM jobs_view_stats where job_id = (SELECT job_id FROM `temptable`)
    

    this is all totally untested so might be typos.

    Login or Signup to reply.
  2. Change COUNT(id) to COUNT(*) since it does not need to check id for being NOT NULL

    Add

    INDEX(job_id, created_at)
    INDEX(created_at, job_id)
    

    Either of those will be "covering". (I don’t know which is better; the Optimizer will know.)

    Drop the index (job_id), as being redundant.
    Unless you specifically need (created_id, id), drop it.

    For even more performance, build and maintain a Summary Table . Then perform the query against it. Such a summary table might have PRIMARY KEY(job_id, created_date) and daily counts for each day for each job.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search