skip to Main Content

I’m using MySQL 8.0
Table NameL: sc (score)
Columns: student_id, course_id, score
Some dummy data:

create table sc(SId varchar(10),CId varchar(10),score decimal(18,1));
insert into sc values('01' , '01' , 80);
insert into sc values('01' , '02' , 90);
insert into sc values('01' , '03' , 99);
insert into sc values('02' , '01' , 70);
insert into sc values('02' , '02' , 60);
insert into sc values('02' , '03' , 80);
insert into sc values('03' , '01' , 80);
insert into sc values('03' , '02' , 80);
insert into sc values('03' , '03' , 80);
insert into sc values('04' , '01' , 50);
insert into sc values('04' , '02' , 30);
insert into sc values('04' , '03' , 20);
insert into sc values('05' , '01' , 76);
insert into sc values('05' , '02' , 87);
insert into sc values('06' , '01' , 31);
insert into sc values('06' , '03' , 34);
insert into sc values('07' , '02' , 89);
insert into sc values('07' , '03' , 98);

The question is: Output all the scores and the average scores of each student, and order the results in descending order of average score.

I came up with two solutions:

-- Solution 1
SELECT
  sc.*,
  AVG(score) OVER (PARTITION BY sid) AS avg_score
FROM sc
ORDER BY avg_score DESC
-- Solution 2
select *  from sc 
left join (
    select sid,avg(score) as avscore from sc 
    group by sid
    )r 
on sc.sid = r.sid
order by avscore desc;

But is there a difference in the performance of the efficiency of these two solutions if the table is really big?

Here’s the screenshot of EXPLAIN for these two queries:
enter image description here

2

Answers


  1. If you add this to the table, it is likely that Solution 2 might shine:

    INDEX(sid, score)
    

    I did a small test using your data and queries, but came up with an indeterminate result. (I checked the session status ‘Handler%’ values.)

    I prefer this variation of 2:

    select *
    FROM (
    select sid,avg(score) as avscore from sc
    group by sid
    ) AS r
    JOIN sc USING(sid)
    order by avscore desc;

    Suggest you try with a million rows.

    Login or Signup to reply.
  2. Here’s a technique for checking performance when you only have a small subset of the data:

    FLUSH STATUS;
    SELECT ...
    SHOW SESSION STATUS LIKE 'Handler%';
    

    — Solution 1

    mysql> flush status;
    Query OK, 0 rows affected (0.01 sec)
    
    mysql> SELECT
        ->   sc.*,
        ->   AVG(score) OVER (PARTITION BY sid) AS avg_score
        -> FROM sc
        -> ORDER BY avg_score DESC;
    +------+------+-------+-----------+
    | sid  | cid  | score | avg_score |
    +------+------+-------+-----------+
    | 07   | 03   |  98.0 |  93.50000 |
    | 07   | 02   |  89.0 |  93.50000 |
    | 01   | 03   |  99.0 |  89.66667 |
    | 01   | 01   |  80.0 |  89.66667 |
    | 01   | 02   |  90.0 |  89.66667 |
    | 05   | 01   |  76.0 |  81.50000 |
    | 05   | 02   |  87.0 |  81.50000 |
    | 03   | 01   |  80.0 |  80.00000 |
    | 03   | 02   |  80.0 |  80.00000 |
    | 03   | 03   |  80.0 |  80.00000 |
    | 02   | 01   |  70.0 |  70.00000 |
    | 02   | 02   |  60.0 |  70.00000 |
    | 02   | 03   |  80.0 |  70.00000 |
    | 04   | 01   |  50.0 |  33.33333 |
    | 04   | 03   |  20.0 |  33.33333 |
    | 04   | 02   |  30.0 |  33.33333 |
    | 06   | 01   |  31.0 |  32.50000 |
    | 06   | 03   |  34.0 |  32.50000 |
    +------+------+-------+-----------+
    18 rows in set (0.00 sec)
    

    Handlers:

    mysql> show session status like 'Handler%';
    +----------------------------+-------+
    | Variable_name              | Value |
    +----------------------------+-------+
    | Handler_commit             | 1     |
    | Handler_delete             | 0     |
    | Handler_discover           | 0     |
    | Handler_external_lock      | 2     |
    | Handler_mrr_init           | 0     |
    | Handler_prepare            | 0     |
    | Handler_read_first         | 1     |
    | Handler_read_key           | 1     |
    | Handler_read_last          | 0     |
    | Handler_read_next          | 0     |
    | Handler_read_prev          | 0     |
    | Handler_read_rnd           | 36    |
    | Handler_read_rnd_next      | 60    |
    | Handler_rollback           | 0     |
    | Handler_savepoint          | 0     |
    | Handler_savepoint_rollback | 0     |
    | Handler_update             | 0     |
    | Handler_write              | 36    |
    +----------------------------+-------+
    18 rows in set (0.00 sec)
    

    — Solution 2

    mysql> flush status;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> select *  from sc 
        -> left join (
        ->     select sid,avg(score) as avscore from sc 
        ->     group by sid
        ->     )r 
        -> on sc.sid = r.sid
        -> order by avscore desc;
    +------+------+-------+------+----------+
    | sid  | cid  | score | sid  | avscore  |
    +------+------+-------+------+----------+
    | 07   | 03   |  98.0 | 07   | 93.50000 |
    | 07   | 02   |  89.0 | 07   | 93.50000 |
    | 01   | 02   |  90.0 | 01   | 89.66667 |
    | 01   | 03   |  99.0 | 01   | 89.66667 |
    | 01   | 01   |  80.0 | 01   | 89.66667 |
    | 05   | 01   |  76.0 | 05   | 81.50000 |
    | 05   | 02   |  87.0 | 05   | 81.50000 |
    | 03   | 01   |  80.0 | 03   | 80.00000 |
    | 03   | 02   |  80.0 | 03   | 80.00000 |
    | 03   | 03   |  80.0 | 03   | 80.00000 |
    | 02   | 01   |  70.0 | 02   | 70.00000 |
    | 02   | 02   |  60.0 | 02   | 70.00000 |
    | 02   | 03   |  80.0 | 02   | 70.00000 |
    | 04   | 01   |  50.0 | 04   | 33.33333 |
    | 04   | 02   |  30.0 | 04   | 33.33333 |
    | 04   | 03   |  20.0 | 04   | 33.33333 |
    | 06   | 01   |  31.0 | 06   | 32.50000 |
    | 06   | 03   |  34.0 | 06   | 32.50000 |
    +------+------+-------+------+----------+
    18 rows in set (0.01 sec)
    

    Handlers:

    mysql> show session status like 'Handler%';
    +----------------------------+-------+
    | Variable_name              | Value |
    +----------------------------+-------+
    | Handler_commit             | 1     |
    | Handler_delete             | 0     |
    | Handler_discover           | 0     |
    | Handler_external_lock      | 4     |
    | Handler_mrr_init           | 0     |
    | Handler_prepare            | 0     |
    | Handler_read_first         | 2     |
    | Handler_read_key           | 20    |
    | Handler_read_last          | 0     |
    | Handler_read_next          | 36    |
    | Handler_read_prev          | 0     |
    | Handler_read_rnd           | 0     |
    | Handler_read_rnd_next      | 19    |
    | Handler_rollback           | 0     |
    | Handler_savepoint          | 0     |
    | Handler_savepoint_rollback | 0     |
    | Handler_update             | 0     |
    | Handler_write              | 7     |
    +----------------------------+-------+
    18 rows in set (0.01 sec)
    

    I deduce that the OVER approach is twice as slow.

    Windowing functions are useful in some situations where there is essentially no other way to achieve a goal, but I have not found them to be faster when there is an alternative.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search