Can anyone suggest me a way to optimise the given query? It has multiple joins and if we try it with larger dataset of 10M records. The query takes much amount of time.
SELECT
AUDM.distributeTS AS distributionDate,
ASMT.name,
ACM.studentid,
AUDM.users AS distributedTo,
ROUND(AUM.totaluser * 100 / AUDM.users) AS participation,
ROUND(AUM.score * 5 / (AUM.totaluser * AQM.qi)) AS performance
FROM
(SELECT
name, assessmentId
FROM
Assessment
WHERE
type IN ('PSYCHOMETRIC' , 'QUANTITATIVE', '')
AND removed = FALSE) ASMT
LEFT JOIN
(SELECT
studentid, assessmentId
FROM
AssessmentCreatorMap) ACM ON ACM.assessmentId = ASMT.AssessmentId
LEFT JOIN
(SELECT
assessmentId, COUNT(assessmentId) AS qi
FROM
AssessmentQuestionMap
GROUP BY assessmentId) AQM ON AQM.assessmentId = ASMT.assessmentId
LEFT JOIN
(SELECT
COUNT(userId) AS users, distributeTS, assessmentId
FROM
AssessmentUserDistributeMap
GROUP BY assessmentId) AUDM ON AUDM.assessmentId = ASMT.assessmentId
LEFT JOIN
(SELECT
assessmentId,
COUNT(assessmentId) AS totaluser,
SUM(assessmentScore) AS score
FROM
AssessmentUserMap
JOIN Student ON AssessmentUserMap.studentId = Student.studentid
WHERE
enrollmentDate IS NOT NULL
AND isDeleted = FALSE
GROUP BY assessmentId) AUM ON AUM.assessmentId = ASMT.assessmentId
ORDER BY ASMT.AssessmentId DESC
LIMIT 0 , 15;
explain yields the following result.
'1', 'PRIMARY', 'Assessment', NULL, 'index', NULL, 'PRIMARY', '4', NULL, '1', '5.00', 'Using where; Backward index scan'
'1', 'PRIMARY', 'AssessmentCreatorMap', NULL, 'ref', 'fk_AssessmentCreatorMap_aid_idx', 'fk_AssessmentCreatorMap_aid_idx', '5', 'OustMe_UAT.Assessment.AssessmentId', '1', '100.00', NULL
'1', 'PRIMARY', '<derived4>', NULL, 'ref', '<auto_key0>', '<auto_key0>', '5', 'OustMe_UAT.Assessment.AssessmentId', '10', '100.00', NULL
'1', 'PRIMARY', '<derived5>', NULL, 'ref', '<auto_key0>', '<auto_key0>', '5', 'OustMe_UAT.Assessment.AssessmentId', '601', '100.00', NULL
'1', 'PRIMARY', '<derived6>', NULL, 'ref', '<auto_key0>', '<auto_key0>', '5', 'OustMe_UAT.Assessment.AssessmentId', '10', '100.00', NULL
'6', 'DERIVED', 'AssessmentUserMap', NULL, 'ALL', 'fk_AssessmentUserMap_assessmentid_idx,fk_aum_studentid,idx_AssessmentUserMap_assessmentId_enrollmentDate,idx_AssessmentUserMap_assessmentId_studentid', NULL, NULL, NULL, '1055', '90.00', 'Using where; Using temporary'
'6', 'DERIVED', 'Student', NULL, 'eq_ref', 'studentid_UNIQUE,idx_Student_studentid,fk_student_isdel', 'studentid_UNIQUE', '182', 'OustMe_UAT.AssessmentUserMap.studentid', '1', '50.00', 'Using index condition; Using where'
'5', 'DERIVED', 'AssessmentUserDistributeMap', NULL, 'index', 'fk_AssessmentUserDistributeMap_aid_idx,idx_AssessmentUserDistributeMap_assessmentId_userId,idx_assessmentUserDistributeMap_userId_assessmentId', 'fk_AssessmentUserDistributeMap_aid_idx', '5', NULL, '397282', '100.00', NULL
'4', 'DERIVED', 'AssessmentQuestionMap', NULL, 'index', 'fk_AssessmentQuestionMap_aid_idx', 'fk_AssessmentQuestionMap_aid_idx', '5', NULL, '3308', '100.00', 'Using index'
most of the tables have the indexes already . Please comment if there is any need to add a new index or how can we rewrite the query to produce the same resultset.
2
Answers
The query is using too many left join and from my experience it’s very slow. Try avoiding it.
Solution for your problem is doing filter before you doing any grouping for any table.
you can get and save assessmentId from below query
updated query:
Below query will calculate all data on table AssessmentQuestionMap, even though we will not using all the result data, because there’s will be filter by ON clause, let say there’s 10000 record data on table AssessmentQuestionMap, it’s not efficient to do all the calculation, that’s why we put the filter before any grouping calculation on the query
Add the filter before grouping
Start by moving the first two tables away from being ‘derived’ tables (‘subqueries in FROM or JOIN’). Keep the derived tables with aggregates (
COUNT...GROUP BY
).Then, some of these indexes may help:
Also, avoid using
COUNT(x)
if unless the goal is to count only rows withx IS NOT NULL
. Normally, it is better to "count rows" viaCOUNT(*)
. (I made those changes; change them back if necessary.)It is not possible to avoid a temp table and sort. This is because the
IN
has multiple items. (A really messyUNION
could avoid that issue.)My index suggestions is incomplete because I don’t know which tables the various columns are in in the last join.