skip to Main Content
select * 
from `attendance_marks` 
where exists (select * 
              from `attendables` 
              where `attendance_marks`.`attendable_id` = `attendables`.`id` 
              and `attendable_type` = 'student' 
              and `attendable_id` = 258672 
              and `attendables`.`deleted_at` is null
              ) 
and (`marked_at` between '2022-09-01 00:00:00' and '2022-09-30 23:59:59')

this query is taking too much time approx 7-10 seconds.
I am trying to optimize it but stuck at here.

Attendance_marks indexes
Indexs for attendance marks table

Attendables Indexes
Indexes for attendable table

Please help me optimize it a little bit.

For reference

number of rows in attendable = 80966

number of rows in attendance_marks = 1853696

Explain select
explain select

2

Answers


  1. I think if we use JOINS instead of Sub-Query, then it will be more performant. Unfortunately, I don’t have the exact data to be able to benchmark the performance.

    select * 
    from attendance_marks 
    inner join attendables on attendables.id = attendance_marks.attendable_id
    where attendable_type = 'student'
    and attendable_id = 258672
    and attendables.deleted_at is null
    and (marked_at between '2022-09-01 00:00:00' and '2022-09-30 23:59:59')
    

    I’m not sure if your business requirement allows changing the PK, and adding index. Incase it does then:

    1. Add index to attendable_id.
    2. I assume that attendables.id is PK. Incase not, add an index to it. Or preferably make it the PK.
    3. In case attendable_type have a lot of different values. Then consider adding an index there too.
    4. If possible don’t have granularity till the seconds’ field in marked_at, instead round to the nearest minute. In our case, we can round off 2022-09-30 23:59:59 to 2022-10-01 00:00:00.
    Login or Signup to reply.
  2. select  b.*
        from  `attendance_marks` AS am
        JOIN  `attendables` AS b  ON am.`attendable_id` = b.`id`
        WHERE  b.`attendable_type` = 'student'
          and  b.`attendable_id` = 258672
          and  b.`deleted_at` is null
          AND  am.`marked_at` >= '2022-09-01'
          AND  am.`marked_at` <  '2022-09-01 + INTERVAL 1 MONTH 
    

    and have these

    am:  INDEX(marked_at, attendable_id)
    am:  INDEX(attendable_id, marked_at)
    b:  INDEX(attendable_type, attendable_id, attendables)
    

    Note that the datetime range works for any granularity.

    (Be sure to check that I got the aliases for the correct tables.)

    This formulation, with these indexes should allow the Optimizer to pick which table is more efficient to start with.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search