I am trying to retrieve data using below query which is running forever. basically I am doing student level grouping by passing parameters from t4 tables rows.
t1 row count: 25 million
CREATE TABLE `t1` (
`id` int NOT NULL AUTO_INCREMENT,
`activity_date` datetime DEFAULT NULL,
`book_title` varchar(200) DEFAULT NULL,
`created` datetime DEFAULT NULL,
`modified` datetime DEFAULT NULL,
`passed` tinyint(1) NOT NULL,
`points` int NOT NULL,
`points_teacher` int DEFAULT NULL,
`quiz_questions_correct` int DEFAULT NULL,
`quiz_questions_issued` int DEFAULT NULL,
`school_year` int DEFAULT NULL,
`sequence_num` int DEFAULT NULL,
`title` varchar(200) DEFAULT NULL,
`type_id` int NOT NULL,
`book_id` int DEFAULT NULL,
`grade_id` int DEFAULT NULL,
`src_quiz_id` int DEFAULT NULL,
`student_user_id` int NOT NULL,
`teacher_user_id` int DEFAULT NULL,
`imported` tinyint(1) NOT NULL DEFAULT '0',
`status` tinyint(1) DEFAULT NULL,
`src_activity_type` int DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `FK61A6FD28D741EFEC` (`teacher_user_id`),
KEY `FK61A6FD2863C33D45` (`student_user_id`),
KEY `FK61A6FD2845A8C289` (`book_id`),
KEY `FK61A6FD28FD8BD6CB` (`grade_id`),
KEY `FK61A6FD28F2826674` (`src_quiz_id`),
KEY `ix_src_activity_activity_date` (`activity_date`),
KEY `ix_src_activities_year_date_questions_correct` (`school_year`,`activity_date`,`quiz_questions_correct`),
KEY `ix_status` (`status`),
KEY `ix_quiz_questions_issued` (`quiz_questions_issued`),
KEY `ix_type_id` (`type_id`),
KEY `ix_src_created` (`created`),
CONSTRAINT `FK61A6FD2845A8C289` FOREIGN KEY (`book_id`) REFERENCES `books` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT,
CONSTRAINT `FK61A6FD2863C33D45` FOREIGN KEY (`student_user_id`) REFERENCES `users` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT,
CONSTRAINT `FK61A6FD28D741EFEC` FOREIGN KEY (`teacher_user_id`) REFERENCES `users` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT,
CONSTRAINT `FK61A6FD28F2826674` FOREIGN KEY (`src_quiz_id`) REFERENCES `src_quizzes` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT,
CONSTRAINT `FK61A6FD28FD8BD6CB` FOREIGN KEY (`grade_id`) REFERENCES `grades` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
) ENGINE=InnoDB AUTO_INCREMENT=23841372 DEFAULT CHARSET=latin1
t2 row count: 27 million
CREATE TABLE `t2` (
`src_activity_id` int NOT NULL,
`lf_class_id` int NOT NULL,
KEY `FK2159E1D8D21506D6` (`lf_class_id`),
KEY `FK2159E1D8205CF734` (`src_activity_id`),
KEY `idx_ClassId_Activity_ID` (`lf_class_id`,`src_activity_id`),
KEY `idx_Activity_ID` (`src_activity_id`),
CONSTRAINT `FK2159E1D8205CF734` FOREIGN KEY (`src_activity_id`) REFERENCES `src_activities` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT,
CONSTRAINT `FK2159E1D8D21506D6` FOREIGN KEY (`lf_class_id`) REFERENCES `lf_classes` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
) ENGINE=InnoDB DEFAULT CHARSET=latin1
t3 row count: 10k
CREATE TABLE `t3` (
`id` int NOT NULL AUTO_INCREMENT,
`avg_quiz_lexile` int DEFAULT NULL,
`avg_quiz_score` int DEFAULT NULL,
`created` datetime DEFAULT NULL,
`lexile` int DEFAULT NULL,
`lexile_fully_computed` tinyint(1) NOT NULL,
`lexile_updated` datetime DEFAULT NULL,
`modified` datetime DEFAULT NULL,
`num_quiz_attempted` int DEFAULT NULL,
`num_quiz_passed` int DEFAULT NULL,
`points_earned` int DEFAULT NULL,
`slz_id` varchar(200) DEFAULT NULL,
`timezone_offset` int DEFAULT NULL,
`words_read` int DEFAULT NULL,
`school_id` int DEFAULT NULL,
`school_group_id` int DEFAULT NULL,
`active` tinyint(1) DEFAULT '1',
`on_board_status` int DEFAULT NULL,
`grade_code` varchar(255) DEFAULT NULL,
`grade_name` varchar(255) DEFAULT NULL,
`first_name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`last_name` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci DEFAULT NULL,
`last_login` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `slz_id` (`slz_id`),
UNIQUE KEY `users_unique_slz_id` (`slz_id`),
KEY `FK6A68E08AB2ED262` (`school_group_id`),
KEY `FK6A68E082A4052E9` (`school_id`),
KEY `ix_users_active` (`active`),
CONSTRAINT `FK6A68E082A4052E9` FOREIGN KEY (`school_id`) REFERENCES `schools` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT,
CONSTRAINT `FK6A68E08AB2ED262` FOREIGN KEY (`school_group_id`) REFERENCES `school_groups` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
) ENGINE=InnoDB AUTO_INCREMENT=852699 DEFAULT CHARSET=latin1
t4 row count: can be 100k but passing only 3000 records in cte
CREATE TABLE `t4` (
`id` int NOT NULL AUTO_INCREMENT,
`created` datetime NOT NULL,
`modified` datetime NOT NULL,
`school_id` int NOT NULL,
`grade_id` int NOT NULL,
`class_id` int NOT NULL,
`student_id` int DEFAULT NULL,
`activity_type` int NOT NULL,
`school_year` int NOT NULL,
`activity_date` datetime NOT NULL,
`batch_time` datetime NOT NULL,
`status` int NOT NULL,
`batch_count` int NOT NULL,
`version` int NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `uk_batch_class_report_etl` (`school_id`,`grade_id`,`activity_type`,`batch_time`,`activity_date`,`class_id`,`student_id`),
KEY `FK_GRADES_JOIN_WITH_CLS` (`grade_id`),
KEY `FK_CLASS_JOIN_WITH_CLS` (`class_id`),
KEY `ix_class_etl_created` (`created`),
KEY `ix_class_etl_activity_type` (`activity_type`),
KEY `ix_class_etl_activity_date` (`activity_date`),
KEY `ix_class_etl_batch_time` (`batch_time`),
KEY `ix_class_etl_status` (`status`),
KEY `idx_batch_class_etl_jobs_School_Year` (`school_year`),
KEY `idx_Covering` (`status`,`activity_type`,`class_id`,`school_year`,`activity_date`),
CONSTRAINT `FK_CLASS_JOIN_WITH_CLS` FOREIGN KEY (`class_id`) REFERENCES `lf_classes` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT,
CONSTRAINT `FK_GRADES_JOIN_WITH_CLS` FOREIGN KEY (`grade_id`) REFERENCES `grades` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT,
CONSTRAINT `FK_SCHOOLS_JOIN_WITH_REP_CLS` FOREIGN KEY (`school_id`) REFERENCES `schools` (`id`) ON DELETE RESTRICT ON UPDATE RESTRICT
) ENGINE=InnoDB AUTO_INCREMENT=9032408 DEFAULT CHARSET=utf8mb3
The query I am using to retrieve data
with cte as (
select distinct class_id,school_year,activity_type,DATE_ADD(Date(activity_date), INTERVAL 1 DAY) activity_date from t4 where status=2 and activity_type=1 and activity_date<='2022-06-30 00:00:00'
)
SELECT -- count(*)
et.activity_date,et.class_id,et.school_year, u.id AS studentId,
SUM( sa.quiz_Questions_Correct) AS quizQuestionsCorrect
,SUM( sa.quiz_Questions_Issued) AS quizQuestionsIssued
FROM t1 sa
JOIN t2 salc ON sa.id=salc.src_activity_id and sa.status<>1 AND sa.type_Id = 1 AND sa.quiz_Questions_Issued IS NOT NULL
JOIN t3 u on u.id=sa.student_User_ID
JOIN cte et on et.school_year=sa.school_year AND sa.activity_Date <=et.activity_date AND et.class_id=salc.lf_class_id
WHERE
sa.status<>1
AND sa.type_Id = 1
AND sa.quiz_Questions_Issued IS NOT NULL
AND sa.activity_Date <=et.activity_date
AND sa.id = (SELECT MAX(sa_max.id) FROM t1 sa_max
JOIN t2 salc_max ON sa_max.id=salc_max.src_activity_id
WHERE sa_max.student_User_ID = sa.student_User_ID AND sa_max.school_year =sa.school_year
AND sa_max.src_Quiz_ID = sa.src_Quiz_ID AND sa_max.quiz_Questions_Issued IS NOT NULL AND sa_max.type_Id = sa.type_Id
-- ORDER BY sa_max.activityDate DESC, sa_max.modified DESC
)
GROUP BY et.activity_date, u.id,et.class_id,et.school_year
How can I optimize it as indexes are already included in create tables script. I even tried same query for one line item from t4 but it takes 10 seconds for this too.
select u.id, SUM(sa.quiz_Questions_Correct) quizQuestionsCorrect,SUM(sa.quiz_Questions_Issued) quizQuestionsIssued
from t1 sa
JOIN t2 salc ON sa.id=salc.src_activity_id
JOIN t3 u on u.id=sa.student_User_ID
where lf_class_id = 33226 AND sa.status <> 1
AND sa.type_Id = 1 -- + SrcActivity.ActivityType.QUIZ.getValue()
AND sa.activity_Date <= '2022-08-05'-- DATE_ADD('2022-08-04', INTERVAL 1 DAY)
AND sa.school_year = 2021
AND sa.id = (SELECT MAX(sa_max.id) FROM t1 sa_max
JOIN t2 salc_max ON sa_max.id=salc_max.src_activity_id WHERE
sa_max.student_User_ID = sa.student_User_ID AND salc_max.src_activity_id =sa.id
AND sa_max.school_year = sa.school_year AND sa_max.src_Quiz_ID = sa.src_Quiz_ID AND sa_max.quiz_Questions_Issued IS NOT NULL AND sa_max.type_Id =sa.type_Id
)
AND sa.quiz_Questions_Issued IS NOT NULL group by u.id
Explain JSON is below
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "2723644.99"
},
"grouping_operation": {
"using_temporary_table": true,
"using_filesort": false,
"nested_loop": [
{
"table": {
"table_name": "et",
"access_type": "ALL",
"rows_examined_per_scan": 8492,
"rows_produced_per_join": 8492,
"filtered": "100.00",
"cost_info": {
"read_cost": "108.65",
"eval_cost": "849.20",
"prefix_cost": "957.85",
"data_read_per_join": "199K"
},
"used_columns": [
"class_id",
"school_year",
"activity_type",
"activity_date"
],
"materialized_from_subquery": {
"using_temporary_table": true,
"dependent": false,
"cacheable": true,
"query_block": {
"select_id": 2,
"cost_info": {
"query_cost": "4648.00"
},
"duplicates_removal": {
"using_temporary_table": true,
"using_filesort": false,
"table": {
"table_name": "t4",
"access_type": "ref",
"possible_keys": [
"ix_class_etl_activity_type",
"ix_class_etl_activity_date",
"ix_class_etl_status",
"idx_Covering"
],
"key": "idx_Covering",
"used_key_parts": [
"status",
"activity_type"
],
"key_length": "8",
"ref": [
"const",
"const"
],
"rows_examined_per_scan": 45098,
"rows_produced_per_join": 8492,
"filtered": "18.83",
"using_index": true,
"cost_info": {
"read_cost": "138.20",
"eval_cost": "849.26",
"prefix_cost": "4648.00",
"data_read_per_join": "530K"
},
"used_columns": [
"id",
"class_id",
"activity_type",
"school_year",
"activity_date",
"status"
],
"attached_condition": "(`litpro`.`t4`.`activity_date` <= TIMESTAMP'2022-06-30 00:00:00')"
}
}
}
}
}
},
{
"table": {
"table_name": "salc",
"access_type": "ref",
"possible_keys": [
"FK2159E1D8D21506D6",
"FK2159E1D8205CF734",
"idx_ClassId_Activity_ID",
"idx_Activity_ID"
],
"key": "idx_ClassId_Activity_ID",
"used_key_parts": [
"lf_class_id"
],
"key_length": "4",
"ref": [
"et.class_id"
],
"rows_examined_per_scan": 285,
"rows_produced_per_join": 2424423,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "12614.75",
"eval_cost": "242442.35",
"prefix_cost": "256014.95",
"data_read_per_join": "36M"
},
"used_columns": [
"src_activity_id",
"lf_class_id"
]
}
},
{
"table": {
"table_name": "sa",
"access_type": "eq_ref",
"possible_keys": [
"PRIMARY",
"FK61A6FD2863C33D45",
"ix_src_activity_activity_date",
"ix_src_activities_year_date_questions_correct",
"ix_status",
"ix_quiz_questions_issued",
"ix_type_id"
],
"key": "PRIMARY",
"used_key_parts": [
"id"
],
"key_length": "4",
"ref": [
"litpro.salc.src_activity_id"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 121221,
"filtered": "5.00",
"cost_info": {
"read_cost": "2091844.39",
"eval_cost": "12122.12",
"prefix_cost": "2590301.69",
"data_read_per_join": "55M"
},
"used_columns": [
"id",
"activity_date",
"quiz_questions_correct",
"quiz_questions_issued",
"school_year",
"type_id",
"src_quiz_id",
"student_user_id",
"status"
],
"attached_condition": "((`litpro`.`sa`.`school_year` = `et`.`school_year`) and (`litpro`.`sa`.`type_id` = 1) and (`litpro`.`sa`.`status` <> 1) and (`litpro`.`sa`.`quiz_questions_issued` is not null) and (`litpro`.`sa`.`activity_date` <= cast(`et`.`activity_date` as datetime)) and (`litpro`.`salc`.`src_activity_id` = (/* select#3 */ select max(`litpro`.`sa_max`.`id`) from `litpro`.`t1` `sa_max` join `litpro`.`t1_lf_classes` `salc_max` where ((`litpro`.`salc_max`.`src_activity_id` = `litpro`.`sa_max`.`id`) and (`litpro`.`sa_max`.`student_user_id` = `litpro`.`sa`.`student_user_id`) and (`litpro`.`sa_max`.`school_year` = `litpro`.`sa`.`school_year`) and (`litpro`.`sa_max`.`src_quiz_id` = `litpro`.`sa`.`src_quiz_id`) and (`litpro`.`sa_max`.`quiz_questions_issued` is not null) and (`litpro`.`sa_max`.`type_id` = `litpro`.`sa`.`type_id`)))) and (`litpro`.`sa`.`activity_date` <= cast(`et`.`activity_date` as datetime)) and (`litpro`.`sa`.`status` <> 1) and (`litpro`.`sa`.`quiz_questions_issued` is not null))",
"attached_subqueries": [
{
"dependent": true,
"cacheable": false,
"query_block": {
"select_id": 3,
"cost_info": {
"query_cost": "84.15"
},
"nested_loop": [
{
"table": {
"table_name": "sa_max",
"access_type": "ref",
"possible_keys": [
"PRIMARY",
"FK61A6FD2863C33D45",
"FK61A6FD28F2826674",
"ix_src_activities_year_date_questions_correct",
"ix_quiz_questions_issued",
"ix_type_id"
],
"key": "FK61A6FD2863C33D45",
"used_key_parts": [
"student_user_id"
],
"key_length": "4",
"ref": [
"litpro.sa.student_user_id"
],
"rows_examined_per_scan": 87,
"rows_produced_per_join": 0,
"filtered": "0.06",
"cost_info": {
"read_cost": "75.36",
"eval_cost": "0.01",
"prefix_cost": "84.09",
"data_read_per_join": "24"
},
"used_columns": [
"id",
"quiz_questions_issued",
"school_year",
"type_id",
"src_quiz_id",
"student_user_id"
],
"attached_condition": "((`litpro`.`sa_max`.`school_year` = `litpro`.`sa`.`school_year`) and (`litpro`.`sa_max`.`src_quiz_id` = `litpro`.`sa`.`src_quiz_id`) and (`litpro`.`sa_max`.`quiz_questions_issued` is not null) and (`litpro`.`sa_max`.`type_id` = `litpro`.`sa`.`type_id`))"
}
},
{
"table": {
"table_name": "salc_max",
"access_type": "ref",
"possible_keys": [
"FK2159E1D8205CF734",
"idx_Activity_ID"
],
"key": "FK2159E1D8205CF734",
"used_key_parts": [
"src_activity_id"
],
"key_length": "4",
"ref": [
"litpro.sa_max.id"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 0,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "0.05",
"eval_cost": "0.01",
"prefix_cost": "84.15",
"data_read_per_join": "0"
},
"used_columns": [
"src_activity_id"
]
}
}
]
}
}
]
}
},
{
"table": {
"table_name": "u",
"access_type": "eq_ref",
"possible_keys": [
"PRIMARY"
],
"key": "PRIMARY",
"used_key_parts": [
"id"
],
"key_length": "4",
"ref": [
"litpro.sa.student_user_id"
],
"rows_examined_per_scan": 1,
"rows_produced_per_join": 121221,
"filtered": "100.00",
"using_index": true,
"cost_info": {
"read_cost": "121221.18",
"eval_cost": "12122.12",
"prefix_cost": "2723644.99",
"data_read_per_join": "327M"
},
"used_columns": [
"id"
]
}
}
]
}
}
}
explain analyze for single line item query
Explain output of query
2
Answers
That is quite an elaborate query. It is going to take several cycles of index design and optimization to get it right.
If you say
EXPLAIN ANALYZE SELECT ...
to MySQL, orANALYZE SELECT ...
to MariaDB, you can get the table server to show you the execution plan it used to satisfy the query. Ordinarily you would include this output in your question.And, the process of optimizing a query is cyclical and looks like this.
Let’s start with your CTE. There’s an opportunity for an index there.
The appropriate index to optimize this subquery is:
Indexes are created to satisfy particular long-running queries. Don’t create them "just in case." To work for a particular query, the index has to have its columns in the correct order.
And, by the way if you don’t need SELECT DISTINCT in this subquery don’t use it. SELECT DISTINCT automatically turns the query into an aggregate query, and may invoke a costly sorting step. I think the index I suggested prevents that sorting step, but I’m not sure.
See if that improves the situation. If it does, ask another question about the next optimization step. If not, please edit this question to show your execution plans.
Add the following two indexes on
t1
:Currently you have a double lookup happening where your query is accessing the
t1
table via thestudent_user_id
column, which it CAN find quickly due to the index on that column, BUT it then has to lookup the main row in the table to get thequiz_questions_correct
andquiz_questions_issued
values.Adding the two above indexes will allow it to access
t1
in just one lookup for each of those SUMs.It’s a pretty complex query, but that jumped out to me pretty quickly as a possible major improvement.