I have a simple MySQL table which consists of a primary key ID field, a timestamp (integer) field, and a foreign key ID field (device_id). There are indexes on each of these columns:
mysql> show indexes from device_heartbeats;
+-------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| device_heartbeats | 0 | PRIMARY | 1 | id | A | 2603552 | NULL | NULL | | BTREE | | |
| device_heartbeats | 1 | IDX...bb8c | 1 | time | A | 1573451 | NULL | NULL | | BTREE | | |
| device_heartbeats | 1 | FKb...xi10 | 1 | device_id | A | 16 | NULL | NULL | | BTREE | | |
+-------------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
The table currently contains about 2.6 million records. Here is the count for each device:
mysql> select device_id, count(device_id) from device_heartbeats group by device_id;
+-----------+------------------+
| device_id | count(device_id) |
+-----------+------------------+
| 1 | 315833 |
| 2 | 589 |
| 3 | 851461 |
| 4 | 2115 |
| 5 | 1104668 |
| 6 | 6 |
| 7 | 409 |
| 8 | 783 |
| 9 | 778 |
| 10 | 772 |
| 11 | 211 |
| 12 | 333 |
| 13 | 57370 |
| 14 | 57121 |
| 15 | 217468 |
| 16 | 58 |
| 17 | 66 |
+-----------+------------------+
17 rows in set (0.26 sec)
I have a query that finds the most recent record using the timestamp field for a particular device_id (850k matching records):
mysql> select * from device_heartbeats where device_id = 3 order by time desc limit 1;
+---------+------------+-----------+
| id | time | device_id |
+---------+------------+-----------+
| 2610040 | 1697068792 | 3 |
+---------+------------+-----------+
1 row in set (0.00 sec)
The performance of this query is good, however if I use another device_id (one with only about 2000 records) then the performance is poor:
mysql> select * from device_heartbeats where device_id = 4 order by time desc limit 1;
+-------+------------+-----------+
| id | time | device_id |
+-------+------------+-----------+
| 48451 | 1684888379 | 4 |
+-------+------------+-----------+
1 row in set (1.59 sec)
The performance is good for each device_id except for 4 and 5.
What is going on here, and how can I fix it so that the performance is always good?
2
Answers
The short answer
If you are only interested in the max
time
for a givendevice_id
, then the lightest/fastest query is simply:with a composite key added:
The long answer
You have not included the EXPLAIN plan for these two queries but we can guess at what is happening with a reasonably high degree of certainty.
If you run your initial
GROUP BY
query withEXPLAIN
you will see something like this:This is reasonably quick as the entire query is performed against the index.
With your current single column indices the optimizer is going to have to choose one of them. With your
order by time desc limit 1
it is highly likely that it is going to choose the index ontime
. This is a great choice if there is a recent row for the givendevice_id
, but not so great if it has to scan a large portion of the index and fetch lots of rows.If you run the
EXPLAIN
for your query fordevice_id = 3
, you will probably see something like this:You will probably see a very similar
EXPLAIN
fordevice_id = 4
, maybe with a higher but grossly under-estimated row count. TheBackward index scan
is going backwards through the index ontime
, reading the corresponding rows from the clustered index (primary key), until it finds a row withdevice_id = ?
. If all the rows for a givendevice_id
are from a "long time ago", it has to fetch a lot of rows. If there is a recent row, it does not have to go very far through the index before finding the first row for the givendevice_id
.You could force the use of the index on
device_id
:This will be much faster for
device_id = 4
, as it only has tofilesort
2115 rows, but it will be relatively slow fordevice_id = 4
, as it has tofilesort
851461 rows.If you add a composite index (as suggested by Senthil P Nathan in the comments):
Note the change from
Using where; Backward index scan
toBackward index scan; Using index
.Because these are now simple index lookups, they should return in < 1ms.
A slightly better option, assuming your table has just the three columns included in your question (
id
,time
,device_id
), is to remove the surrogate PK:Sensor data is best done with the
PRIMARY KEY
starting with thedevice_id
. Start with the thorough discussion by @user1191247, and @Schwern. Now, let me wrap them together with this add-on.I agree with getting rid of
id
altogether. But if you cannot trust the times to be unique for each device, then keep the auto_incid
and do this:The PK gives you the "clustering" that benefits many of the likely queries for this kind of data. And it does not really have any adverse impact on
INSERTs
, despite the inserts being very close to ‘chronological’.The
INDEX(id)
is all that is needed to keepAUTO_INCREMENT
happy.The recommended
SELECT MAX(ts) FROM device_heartbeats WHERE device_id = 3;
will touch only one row, taking only a few milliseconds, even if the necessary block(s) are not cached. The index is "covering" (Explain’s "Using index") and will not need any filesort.Even this will be nearly instantaneous for getting other columns:
The uneven distribution of
device_id
values won’t affect these particular queries.(A side note: If you plan to eventually delete ‘old’ data, I strongly recommend
PARTITION BY RANGE
. See Partition ).