skip to Main Content

I am trying to retrieve data using below query which is running forever. basically I am doing student level grouping by passing parameters from t4 tables rows.

t1 row count: 25 million

 CREATE TABLE `t1` (  
  `id` int NOT NULL AUTO_INCREMENT,
  `activity_date` datetime DEFAULT NULL,
  `book_title` varchar(200) DEFAULT NULL,
  `created` datetime DEFAULT NULL,
  `modified` datetime DEFAULT NULL,
  `passed` tinyint(1) NOT NULL,
  `points` int NOT NULL,
  `points_teacher` int DEFAULT NULL,
  `quiz_questions_correct` int DEFAULT NULL,
  `quiz_questions_issued` int DEFAULT NULL,
  `school_year` int DEFAULT NULL,
  `sequence_num` int DEFAULT NULL,
  `title` varchar(200) DEFAULT NULL,
  `type_id` int NOT NULL,
  `book_id` int DEFAULT NULL,
  `grade_id` int DEFAULT NULL,
  `src_quiz_id` int DEFAULT NULL,
  `student_user_id` int NOT NULL,
  `teacher_user_id` int DEFAULT NULL,
  `imported` tinyint(1) NOT NULL DEFAULT '0',
  `status` tinyint(1) DEFAULT NULL,
  `src_activity_type` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `FK61A6FD28D741EFEC` (`teacher_user_id`),
  KEY `FK61A6FD2863C33D45` (`student_user_id`),
  KEY `FK61A6FD2845A8C289` (`book_id`),
  KEY `FK61A6FD28FD8BD6CB` (`grade_id`),
  KEY `FK61A6FD28F2826674` (`src_quiz_id`),
  KEY `ix_src_activity_activity_date` (`activity_date`),
  KEY `ix_src_activities_year_date_questions_correct` (`school_year`,`activity_date`,`quiz_questions_correct`),
  KEY `ix_status` (`status`),
  KEY `ix_quiz_questions_issued` (`quiz_questions_issued`),
  KEY `ix_type_id` (`type_id`),
  KEY `ix_src_created` (`created`),
  CONSTRAINT `FK61A6FD2845A8C289` FOREIGN KEY (`book_id`) REFERENCES `books` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT,
  CONSTRAINT `FK61A6FD2863C33D45` FOREIGN KEY (`student_user_id`) REFERENCES `users` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT,
  CONSTRAINT `FK61A6FD28D741EFEC` FOREIGN KEY (`teacher_user_id`) REFERENCES `users` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT,
  CONSTRAINT `FK61A6FD28F2826674` FOREIGN KEY (`src_quiz_id`) REFERENCES `src_quizzes` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT,
  CONSTRAINT `FK61A6FD28FD8BD6CB` FOREIGN KEY (`grade_id`) REFERENCES `grades` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
) ENGINE=InnoDB AUTO_INCREMENT=23841372 DEFAULT CHARSET=latin1

t2 row count: 27 million

CREATE TABLE `t2` (
  `src_activity_id` int NOT NULL,
  `lf_class_id` int NOT NULL,
  KEY `FK2159E1D8D21506D6` (`lf_class_id`),
  KEY `FK2159E1D8205CF734` (`src_activity_id`),
  KEY `idx_ClassId_Activity_ID` (`lf_class_id`,`src_activity_id`),
  KEY `idx_Activity_ID` (`src_activity_id`),
  CONSTRAINT `FK2159E1D8205CF734` FOREIGN KEY (`src_activity_id`) REFERENCES `src_activities` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT,
  CONSTRAINT `FK2159E1D8D21506D6` FOREIGN KEY (`lf_class_id`) REFERENCES `lf_classes` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
) ENGINE=InnoDB DEFAULT CHARSET=latin1 

t3 row count: 10k

CREATE TABLE `t3` (
  `id` int NOT NULL AUTO_INCREMENT,
  `avg_quiz_lexile` int DEFAULT NULL,
  `avg_quiz_score` int DEFAULT NULL,
  `created` datetime DEFAULT NULL,
  `lexile` int DEFAULT NULL,
  `lexile_fully_computed` tinyint(1) NOT NULL,
  `lexile_updated` datetime DEFAULT NULL,
  `modified` datetime DEFAULT NULL,
  `num_quiz_attempted` int DEFAULT NULL,
  `num_quiz_passed` int DEFAULT NULL,
  `points_earned` int DEFAULT NULL,
  `slz_id` varchar(200) DEFAULT NULL,
  `timezone_offset` int DEFAULT NULL,
  `words_read` int DEFAULT NULL,
  `school_id` int DEFAULT NULL,
  `school_group_id` int DEFAULT NULL,
  `active` tinyint(1) DEFAULT '1',
  `on_board_status` int DEFAULT NULL,
  `grade_code` varchar(255) DEFAULT NULL,
  `grade_name` varchar(255) DEFAULT NULL,
  `first_name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `last_name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
  `last_login` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `slz_id` (`slz_id`),
  UNIQUE KEY `users_unique_slz_id` (`slz_id`),
  KEY `FK6A68E08AB2ED262` (`school_group_id`),
  KEY `FK6A68E082A4052E9` (`school_id`),
  KEY `ix_users_active` (`active`),
  CONSTRAINT `FK6A68E082A4052E9` FOREIGN KEY (`school_id`) REFERENCES `schools` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT,
  CONSTRAINT `FK6A68E08AB2ED262` FOREIGN KEY (`school_group_id`) REFERENCES `school_groups` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
) ENGINE=InnoDB AUTO_INCREMENT=852699 DEFAULT CHARSET=latin1

t4 row count: can be 100k but passing only 3000 records in cte

CREATE TABLE `t4` (
  `id` int NOT NULL AUTO_INCREMENT,
  `created` datetime NOT NULL,
  `modified` datetime NOT NULL,
  `school_id` int NOT NULL,
  `grade_id` int NOT NULL,
  `class_id` int NOT NULL,
  `student_id` int DEFAULT NULL,
  `activity_type` int NOT NULL,
  `school_year` int NOT NULL,
  `activity_date` datetime NOT NULL,
  `batch_time` datetime NOT NULL,
  `status` int NOT NULL,
  `batch_count` int NOT NULL,
  `version` int NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `uk_batch_class_report_etl` (`school_id`,`grade_id`,`activity_type`,`batch_time`,`activity_date`,`class_id`,`student_id`),
  KEY `FK_GRADES_JOIN_WITH_CLS` (`grade_id`),
  KEY `FK_CLASS_JOIN_WITH_CLS` (`class_id`),
  KEY `ix_class_etl_created` (`created`),
  KEY `ix_class_etl_activity_type` (`activity_type`),
  KEY `ix_class_etl_activity_date` (`activity_date`),
  KEY `ix_class_etl_batch_time` (`batch_time`),
  KEY `ix_class_etl_status` (`status`),
  KEY `idx_batch_class_etl_jobs_School_Year` (`school_year`),
  KEY `idx_Covering` (`status`,`activity_type`,`class_id`,`school_year`,`activity_date`),
  CONSTRAINT `FK_CLASS_JOIN_WITH_CLS` FOREIGN KEY (`class_id`) REFERENCES `lf_classes` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT,
  CONSTRAINT `FK_GRADES_JOIN_WITH_CLS` FOREIGN KEY (`grade_id`) REFERENCES `grades` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT,
  CONSTRAINT `FK_SCHOOLS_JOIN_WITH_REP_CLS` FOREIGN KEY (`school_id`) REFERENCES `schools` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
) ENGINE=InnoDB AUTO_INCREMENT=9032408 DEFAULT CHARSET=utf8mb3  

The query I am using to retrieve data

with cte as (
select distinct class_id,school_year,activity_type,DATE_ADD(Date(activity_date), INTERVAL 1 DAY) activity_date from t4 where status=2 and activity_type=1  and activity_date<='2022-06-30 00:00:00'
)
 SELECT -- count(*)
 et.activity_date,et.class_id,et.school_year,   u.id AS studentId,
 SUM( sa.quiz_Questions_Correct) AS quizQuestionsCorrect
 ,SUM( sa.quiz_Questions_Issued) AS quizQuestionsIssued
FROM t1 sa
JOIN t2 salc ON sa.id=salc.src_activity_id  and sa.status<>1 AND sa.type_Id = 1 AND sa.quiz_Questions_Issued IS NOT NULL
JOIN t3 u on u.id=sa.student_User_ID
JOIN cte et on  et.school_year=sa.school_year  AND  sa.activity_Date <=et.activity_date AND et.class_id=salc.lf_class_id 
WHERE  
  sa.status<>1
 AND sa.type_Id = 1
 AND sa.quiz_Questions_Issued IS NOT NULL
 AND sa.activity_Date <=et.activity_date
AND sa.id = (SELECT MAX(sa_max.id) FROM t1 sa_max
        JOIN t2 salc_max ON sa_max.id=salc_max.src_activity_id
            WHERE sa_max.student_User_ID = sa.student_User_ID  AND sa_max.school_year =sa.school_year
             AND sa_max.src_Quiz_ID = sa.src_Quiz_ID AND sa_max.quiz_Questions_Issued IS NOT NULL AND sa_max.type_Id = sa.type_Id
           -- ORDER BY sa_max.activityDate DESC, sa_max.modified DESC
         ) 
 GROUP BY et.activity_date, u.id,et.class_id,et.school_year  

How can I optimize it as indexes are already included in create tables script. I even tried same query for one line item from t4 but it takes 10 seconds for this too.

select u.id, SUM(sa.quiz_Questions_Correct) quizQuestionsCorrect,SUM(sa.quiz_Questions_Issued) quizQuestionsIssued
 from t1 sa
JOIN t2 salc ON sa.id=salc.src_activity_id 
JOIN t3 u on u.id=sa.student_User_ID
where lf_class_id = 33226 AND sa.status <> 1 
AND sa.type_Id = 1 -- + SrcActivity.ActivityType.QUIZ.getValue()
AND sa.activity_Date <= '2022-08-05'-- DATE_ADD('2022-08-04', INTERVAL 1 DAY) 
 AND sa.school_year = 2021
 AND sa.id = (SELECT MAX(sa_max.id) FROM t1 sa_max
     JOIN t2 salc_max ON sa_max.id=salc_max.src_activity_id WHERE 
sa_max.student_User_ID = sa.student_User_ID AND salc_max.src_activity_id =sa.id  
             AND sa_max.school_year = sa.school_year AND sa_max.src_Quiz_ID = sa.src_Quiz_ID AND sa_max.quiz_Questions_Issued IS   NOT NULL   AND sa_max.type_Id =sa.type_Id           
        ) 
AND sa.quiz_Questions_Issued IS NOT NULL group by u.id

Explain JSON is below

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "2723644.99"
    },
    "grouping_operation": {
      "using_temporary_table": true,
      "using_filesort": false,
      "nested_loop": [
        {
          "table": {
            "table_name": "et",
            "access_type": "ALL",
            "rows_examined_per_scan": 8492,
            "rows_produced_per_join": 8492,
            "filtered": "100.00",
            "cost_info": {
              "read_cost": "108.65",
              "eval_cost": "849.20",
              "prefix_cost": "957.85",
              "data_read_per_join": "199K"
            },
            "used_columns": [
              "class_id",
              "school_year",
              "activity_type",
              "activity_date"
            ],
            "materialized_from_subquery": {
              "using_temporary_table": true,
              "dependent": false,
              "cacheable": true,
              "query_block": {
                "select_id": 2,
                "cost_info": {
                  "query_cost": "4648.00"
                },
                "duplicates_removal": {
                  "using_temporary_table": true,
                  "using_filesort": false,
                  "table": {
                    "table_name": "t4",
                    "access_type": "ref",
                    "possible_keys": [
                      "ix_class_etl_activity_type",
                      "ix_class_etl_activity_date",
                      "ix_class_etl_status",
                      "idx_Covering"
                    ],
                    "key": "idx_Covering",
                    "used_key_parts": [
                      "status",
                      "activity_type"
                    ],
                    "key_length": "8",
                    "ref": [
                      "const",
                      "const"
                    ],
                    "rows_examined_per_scan": 45098,
                    "rows_produced_per_join": 8492,
                    "filtered": "18.83",
                    "using_index": true,
                    "cost_info": {
                      "read_cost": "138.20",
                      "eval_cost": "849.26",
                      "prefix_cost": "4648.00",
                      "data_read_per_join": "530K"
                    },
                    "used_columns": [
                      "id",
                      "class_id",
                      "activity_type",
                      "school_year",
                      "activity_date",
                      "status"
                    ],
                    "attached_condition": "(`litpro`.`t4`.`activity_date` <= TIMESTAMP'2022-06-30 00:00:00')"
                  }
                }
              }
            }
          }
        },
        {
          "table": {
            "table_name": "salc",
            "access_type": "ref",
            "possible_keys": [
              "FK2159E1D8D21506D6",
              "FK2159E1D8205CF734",
              "idx_ClassId_Activity_ID",
              "idx_Activity_ID"
            ],
            "key": "idx_ClassId_Activity_ID",
            "used_key_parts": [
              "lf_class_id"
            ],
            "key_length": "4",
            "ref": [
              "et.class_id"
            ],
            "rows_examined_per_scan": 285,
            "rows_produced_per_join": 2424423,
            "filtered": "100.00",
            "using_index": true,
            "cost_info": {
              "read_cost": "12614.75",
              "eval_cost": "242442.35",
              "prefix_cost": "256014.95",
              "data_read_per_join": "36M"
            },
            "used_columns": [
              "src_activity_id",
              "lf_class_id"
            ]
          }
        },
        {
          "table": {
            "table_name": "sa",
            "access_type": "eq_ref",
            "possible_keys": [
              "PRIMARY",
              "FK61A6FD2863C33D45",
              "ix_src_activity_activity_date",
              "ix_src_activities_year_date_questions_correct",
              "ix_status",
              "ix_quiz_questions_issued",
              "ix_type_id"
            ],
            "key": "PRIMARY",
            "used_key_parts": [
              "id"
            ],
            "key_length": "4",
            "ref": [
              "litpro.salc.src_activity_id"
            ],
            "rows_examined_per_scan": 1,
            "rows_produced_per_join": 121221,
            "filtered": "5.00",
            "cost_info": {
              "read_cost": "2091844.39",
              "eval_cost": "12122.12",
              "prefix_cost": "2590301.69",
              "data_read_per_join": "55M"
            },
            "used_columns": [
              "id",
              "activity_date",
              "quiz_questions_correct",
              "quiz_questions_issued",
              "school_year",
              "type_id",
              "src_quiz_id",
              "student_user_id",
              "status"
            ],
            "attached_condition": "((`litpro`.`sa`.`school_year` = `et`.`school_year`) and (`litpro`.`sa`.`type_id` = 1) and (`litpro`.`sa`.`status` <> 1) and (`litpro`.`sa`.`quiz_questions_issued` is not null) and (`litpro`.`sa`.`activity_date` <= cast(`et`.`activity_date` as datetime)) and (`litpro`.`salc`.`src_activity_id` = (/* select#3 */ select max(`litpro`.`sa_max`.`id`) from `litpro`.`t1` `sa_max` join `litpro`.`t1_lf_classes` `salc_max` where ((`litpro`.`salc_max`.`src_activity_id` = `litpro`.`sa_max`.`id`) and (`litpro`.`sa_max`.`student_user_id` = `litpro`.`sa`.`student_user_id`) and (`litpro`.`sa_max`.`school_year` = `litpro`.`sa`.`school_year`) and (`litpro`.`sa_max`.`src_quiz_id` = `litpro`.`sa`.`src_quiz_id`) and (`litpro`.`sa_max`.`quiz_questions_issued` is not null) and (`litpro`.`sa_max`.`type_id` = `litpro`.`sa`.`type_id`)))) and (`litpro`.`sa`.`activity_date` <= cast(`et`.`activity_date` as datetime)) and (`litpro`.`sa`.`status` <> 1) and (`litpro`.`sa`.`quiz_questions_issued` is not null))",
            "attached_subqueries": [
              {
                "dependent": true,
                "cacheable": false,
                "query_block": {
                  "select_id": 3,
                  "cost_info": {
                    "query_cost": "84.15"
                  },
                  "nested_loop": [
                    {
                      "table": {
                        "table_name": "sa_max",
                        "access_type": "ref",
                        "possible_keys": [
                          "PRIMARY",
                          "FK61A6FD2863C33D45",
                          "FK61A6FD28F2826674",
                          "ix_src_activities_year_date_questions_correct",
                          "ix_quiz_questions_issued",
                          "ix_type_id"
                        ],
                        "key": "FK61A6FD2863C33D45",
                        "used_key_parts": [
                          "student_user_id"
                        ],
                        "key_length": "4",
                        "ref": [
                          "litpro.sa.student_user_id"
                        ],
                        "rows_examined_per_scan": 87,
                        "rows_produced_per_join": 0,
                        "filtered": "0.06",
                        "cost_info": {
                          "read_cost": "75.36",
                          "eval_cost": "0.01",
                          "prefix_cost": "84.09",
                          "data_read_per_join": "24"
                        },
                        "used_columns": [
                          "id",
                          "quiz_questions_issued",
                          "school_year",
                          "type_id",
                          "src_quiz_id",
                          "student_user_id"
                        ],
                        "attached_condition": "((`litpro`.`sa_max`.`school_year` = `litpro`.`sa`.`school_year`) and (`litpro`.`sa_max`.`src_quiz_id` = `litpro`.`sa`.`src_quiz_id`) and (`litpro`.`sa_max`.`quiz_questions_issued` is not null) and (`litpro`.`sa_max`.`type_id` = `litpro`.`sa`.`type_id`))"
                      }
                    },
                    {
                      "table": {
                        "table_name": "salc_max",
                        "access_type": "ref",
                        "possible_keys": [
                          "FK2159E1D8205CF734",
                          "idx_Activity_ID"
                        ],
                        "key": "FK2159E1D8205CF734",
                        "used_key_parts": [
                          "src_activity_id"
                        ],
                        "key_length": "4",
                        "ref": [
                          "litpro.sa_max.id"
                        ],
                        "rows_examined_per_scan": 1,
                        "rows_produced_per_join": 0,
                        "filtered": "100.00",
                        "using_index": true,
                        "cost_info": {
                          "read_cost": "0.05",
                          "eval_cost": "0.01",
                          "prefix_cost": "84.15",
                          "data_read_per_join": "0"
                        },
                        "used_columns": [
                          "src_activity_id"
                        ]
                      }
                    }
                  ]
                }
              }
            ]
          }
        },
        {
          "table": {
            "table_name": "u",
            "access_type": "eq_ref",
            "possible_keys": [
              "PRIMARY"
            ],
            "key": "PRIMARY",
            "used_key_parts": [
              "id"
            ],
            "key_length": "4",
            "ref": [
              "litpro.sa.student_user_id"
            ],
            "rows_examined_per_scan": 1,
            "rows_produced_per_join": 121221,
            "filtered": "100.00",
            "using_index": true,
            "cost_info": {
              "read_cost": "121221.18",
              "eval_cost": "12122.12",
              "prefix_cost": "2723644.99",
              "data_read_per_join": "327M"
            },
            "used_columns": [
              "id"
            ]
          }
        }
      ]
    }
  }
}

explain analyze for single line item query
Explain output of query

2

Answers


  1. That is quite an elaborate query. It is going to take several cycles of index design and optimization to get it right.

    If you say EXPLAIN ANALYZE SELECT ... to MySQL, or ANALYZE SELECT ... to MariaDB, you can get the table server to show you the execution plan it used to satisfy the query. Ordinarily you would include this output in your question.

    And, the process of optimizing a query is cyclical and looks like this.

    1. Look at the execution plan and the query. If it’s fast enough you’re done.
    2. Add an index that looks like it might help.
    3. Go to step 1.

    Let’s start with your CTE. There’s an opportunity for an index there.

       select distinct
             class_id, school_year, activity_type,
             DATE_ADD(Date(activity_date), INTERVAL 1 DAY) activity_date
        from t4
        where status=2
          and activity_type=1
    

    The appropriate index to optimize this subquery is:

    CREATE INDEX cte_cover ON t4 (
            status, activity_type, activity_date,
            class_id, school_year);
    

    Indexes are created to satisfy particular long-running queries. Don’t create them "just in case." To work for a particular query, the index has to have its columns in the correct order.

    • First, all the columns with equality matching in WHERE
    • Next, one column for range mapping.
    • Next, and this can sometimes be tricky, columns for ordering and aggregating.

    And, by the way if you don’t need SELECT DISTINCT in this subquery don’t use it. SELECT DISTINCT automatically turns the query into an aggregate query, and may invoke a costly sorting step. I think the index I suggested prevents that sorting step, but I’m not sure.

    See if that improves the situation. If it does, ask another question about the next optimization step. If not, please edit this question to show your execution plans.

    Login or Signup to reply.
  2. Add the following two indexes on t1:

    (student_user_id, quiz_questions_correct)
    (student_user_id, quiz_questions_issued)
    

    Currently you have a double lookup happening where your query is accessing the t1 table via the student_user_id column, which it CAN find quickly due to the index on that column, BUT it then has to lookup the main row in the table to get the quiz_questions_correct and quiz_questions_issued values.

    Adding the two above indexes will allow it to access t1 in just one lookup for each of those SUMs.

    It’s a pretty complex query, but that jumped out to me pretty quickly as a possible major improvement.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search