skip to Main Content

Can anyone suggest me a way to optimise the given query? It has multiple joins and if we try it with larger dataset of 10M records. The query takes much amount of time.

SELECT 
    AUDM.distributeTS AS distributionDate,
    ASMT.name,
    ACM.studentid,
    AUDM.users AS distributedTo,
    ROUND(AUM.totaluser * 100 / AUDM.users) AS participation,
    ROUND(AUM.score * 5 / (AUM.totaluser * AQM.qi)) AS performance
FROM
    (SELECT 
        name, assessmentId
    FROM
        Assessment
    WHERE
        type IN ('PSYCHOMETRIC' , 'QUANTITATIVE', '')
            AND removed = FALSE) ASMT
        LEFT JOIN
    (SELECT 
        studentid, assessmentId
    FROM
        AssessmentCreatorMap) ACM ON ACM.assessmentId = ASMT.AssessmentId
        LEFT JOIN
    (SELECT 
        assessmentId, COUNT(assessmentId) AS qi
    FROM
        AssessmentQuestionMap
    GROUP BY assessmentId) AQM ON AQM.assessmentId = ASMT.assessmentId
        LEFT JOIN
    (SELECT 
         COUNT(userId) AS users, distributeTS, assessmentId
    FROM
        AssessmentUserDistributeMap 
    GROUP BY assessmentId) AUDM ON AUDM.assessmentId = ASMT.assessmentId
        LEFT JOIN
    (SELECT 
        assessmentId,
            COUNT(assessmentId) AS totaluser,
            SUM(assessmentScore) AS score
    FROM
        AssessmentUserMap
    JOIN Student ON AssessmentUserMap.studentId = Student.studentid
    WHERE
        enrollmentDate IS NOT NULL
            AND isDeleted = FALSE
    GROUP BY assessmentId) AUM ON AUM.assessmentId = ASMT.assessmentId
ORDER BY ASMT.AssessmentId DESC
LIMIT 0 , 15;

explain yields the following result.

'1', 'PRIMARY', 'Assessment', NULL, 'index', NULL, 'PRIMARY', '4', NULL, '1', '5.00', 'Using where; Backward index scan'
'1', 'PRIMARY', 'AssessmentCreatorMap', NULL, 'ref', 'fk_AssessmentCreatorMap_aid_idx', 'fk_AssessmentCreatorMap_aid_idx', '5', 'OustMe_UAT.Assessment.AssessmentId', '1', '100.00', NULL
'1', 'PRIMARY', '<derived4>', NULL, 'ref', '<auto_key0>', '<auto_key0>', '5', 'OustMe_UAT.Assessment.AssessmentId', '10', '100.00', NULL
'1', 'PRIMARY', '<derived5>', NULL, 'ref', '<auto_key0>', '<auto_key0>', '5', 'OustMe_UAT.Assessment.AssessmentId', '601', '100.00', NULL
'1', 'PRIMARY', '<derived6>', NULL, 'ref', '<auto_key0>', '<auto_key0>', '5', 'OustMe_UAT.Assessment.AssessmentId', '10', '100.00', NULL
'6', 'DERIVED', 'AssessmentUserMap', NULL, 'ALL', 'fk_AssessmentUserMap_assessmentid_idx,fk_aum_studentid,idx_AssessmentUserMap_assessmentId_enrollmentDate,idx_AssessmentUserMap_assessmentId_studentid', NULL, NULL, NULL, '1055', '90.00', 'Using where; Using temporary'
'6', 'DERIVED', 'Student', NULL, 'eq_ref', 'studentid_UNIQUE,idx_Student_studentid,fk_student_isdel', 'studentid_UNIQUE', '182', 'OustMe_UAT.AssessmentUserMap.studentid', '1', '50.00', 'Using index condition; Using where'
'5', 'DERIVED', 'AssessmentUserDistributeMap', NULL, 'index', 'fk_AssessmentUserDistributeMap_aid_idx,idx_AssessmentUserDistributeMap_assessmentId_userId,idx_assessmentUserDistributeMap_userId_assessmentId', 'fk_AssessmentUserDistributeMap_aid_idx', '5', NULL, '397282', '100.00', NULL
'4', 'DERIVED', 'AssessmentQuestionMap', NULL, 'index', 'fk_AssessmentQuestionMap_aid_idx', 'fk_AssessmentQuestionMap_aid_idx', '5', NULL, '3308', '100.00', 'Using index'

most of the tables have the indexes already . Please comment if there is any need to add a new index or how can we rewrite the query to produce the same resultset.

2

Answers


  1. The query is using too many left join and from my experience it’s very slow. Try avoiding it.

    Solution for your problem is doing filter before you doing any grouping for any table.

    you can get and save assessmentId from below query

        with x as (   SELECT name, assessmentId
        FROM   Assessment
        WHERE
        type IN ('PSYCHOMETRIC' , 'QUANTITATIVE', '')
        AND removed = FALSE);
    
        SELECT 
        COUNT(userId) AS users, distributeTS, assessmentId
        FROM
        AssessmentUserDistributeMap 
        where assessmentId in (select assessmentId from x) -- add this block in every your grouping
        GROUP BY assessmentId
    

    updated query:

        WITH ASMT as (SELECT name, assessmentId
            FROM  Assessment
            WHERE type IN ('PSYCHOMETRIC' , 'QUANTITATIVE', '')
            AND removed = FALSE)            
        SELECT 
            AUDM.distributeTS AS distributionDate,
            ASMT.name,
            ACM.studentid,
            AUDM.users AS distributedTo,
            ROUND(AUM.totaluser * 100 / AUDM.users) AS participation,
            ROUND(AUM.score * 5 / (AUM.totaluser * AQM.qi)) AS performance
        FROM ASMT
            LEFT JOIN
            (SELECT studentid, assessmentId
            FROM AssessmentCreatorMap Where assessmentId in (select assessmentId from ASMT)) ACM ON ACM.assessmentId = ASMT.AssessmentId
            LEFT JOIN
            (SELECT assessmentId, COUNT(assessmentId) AS qi
            FROM AssessmentQuestionMap  Where assessmentId in (select assessmentId from ASMT)
            GROUP BY assessmentId) AQM ON AQM.assessmentId = ASMT.assessmentId
            LEFT JOIN 
            (SELECT COUNT(userId) AS users, distributeTS, assessmentId
            FROM AssessmentUserDistributeMap Where assessmentId in (select assessmentId from ASMT)
            GROUP BY assessmentId) AUDM ON AUDM.assessmentId = ASMT.assessmentId
            LEFT JOIN
            (SELECT assessmentId, COUNT(assessmentId) AS totaluser, SUM(assessmentScore) AS score
            FROM AssessmentUserMap
            JOIN Student ON AssessmentUserMap.studentId = Student.studentid
            WHERE enrollmentDate IS NOT NULL 
            AND isDeleted = FALSE
            and assessmentId in (select assessmentId from ASMT)
            GROUP BY assessmentId) AUM ON AUM.assessmentId = ASMT.assessmentId
        ORDER BY ASMT.AssessmentId DESC
        LIMIT 0 , 15;
    

    Below query will calculate all data on table AssessmentQuestionMap, even though we will not using all the result data, because there’s will be filter by ON clause, let say there’s 10000 record data on table AssessmentQuestionMap, it’s not efficient to do all the calculation, that’s why we put the filter before any grouping calculation on the query

        LEFT JOIN
        (SELECT assessmentId, COUNT(assessmentId) AS qi
        FROM AssessmentQuestionMap
        GROUP BY assessmentId) AQM 
    

    Add the filter before grouping

        LEFT JOIN
        (SELECT assessmentId, COUNT(assessmentId) AS qi
        FROM AssessmentQuestionMap  Where assessmentId in (select assessmentId from ASMT)
        GROUP BY assessmentId) AQM ON AQM.assessmentId = ASMT.assessmentId
    
    Login or Signup to reply.
  2. Start by moving the first two tables away from being ‘derived’ tables (‘subqueries in FROM or JOIN’). Keep the derived tables with aggregates (COUNT...GROUP BY).

    SELECT  AUDM.distributeTS AS distributionDate, ASMT.name, ACM.studentid,
            AUDM.users AS distributedTo, ROUND(AUM.totaluser * 100 / AUDM.users) AS participation,
            ROUND(AUM.score * 5 / (AUM.totaluser * AQM.qi)) AS performance
        FROM  Assessment AS ASMT
        JOIN  AssessmentCreatorMap AS ACM  ON ACM.assessmentId = ASMT.AssessmentId 
        LEFT JOIN  
        (
            SELECT  assessmentId, COUNT(*) AS qi
                FROM  AssessmentQuestionMap
                GROUP BY  assessmentId
        ) AQM  ON AQM.assessmentId = ASMT.assessmentId
        LEFT JOIN  
        (
            SELECT  COUNT(*) AS users, distributeTS, assessmentId
                FROM  AssessmentUserDistributeMap
                GROUP BY  assessmentId
        ) AUDM  ON AUDM.assessmentId = ASMT.assessmentId
        LEFT JOIN  
        (
            SELECT  assessmentId, COUNT(*) AS totaluser, SUM(assessmentScore) AS score
                FROM  AssessmentUserMap
                JOIN  Student  ON AssessmentUserMap.studentId = Student.studentid
                WHERE  enrollmentDate IS NOT NULL
                  AND  isDeleted = FALSE
                GROUP BY  assessmentId
        ) AUM  ON AUM.assessmentId = ASMT.assessmentId
        WHERE  ASMT.type IN ('PSYCHOMETRIC' , 'QUANTITATIVE', '')
          AND  ASMT.removed = FALSE
         ORDER BY  ASMT.AssessmentId DESC
        LIMIT  0 , 15;
    

    Then, some of these indexes may help:

    AUDM:  INDEX(assessmentId,  distributeTS, users)
    ASMT:  INDEX(removed, type,  name, assessmentId)
    ACM:  INDEX(assessmentId,  studentid)
    AUM:  INDEX(assessmentId,  totaluser, score)
    AQM:  INDEX(assessmentId,  qi)
    AssessmentUserMap:  INDEX(studentId)
    Student:  INDEX(studentid)
    

    Also, avoid using COUNT(x) if unless the goal is to count only rows with x IS NOT NULL. Normally, it is better to "count rows" via COUNT(*). (I made those changes; change them back if necessary.)

    It is not possible to avoid a temp table and sort. This is because the IN has multiple items. (A really messy UNION could avoid that issue.)

    My index suggestions is incomplete because I don’t know which tables the various columns are in in the last join.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search