I’m writing a query in MySQL for a Grafana dashboard, but the performance of only this particular query is very poor.
This is the query:
SELECT
br.reading
,CONCAT(ino.project,"_",SUBSTRING_INDEX(br.sensor,"_",-1)) AS new_metric
,ino.vessel
,ADDTIME(DATE('2023-01-01'),-TIMEDIFF(ino.date,br.datetime)) AS norm_date
,br.sensor
FROM db.milestones br
INNER JOIN db.projects ino
ON br.datetime BETWEEN ino.date AND ino.enddate
WHERE project IN ( 'project1','project2','project3','project4' )
AND br.sensor LIKE CONCAT(ino.vessel , '%')
EXPLAIN
output:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
| -- | ----------- | ----- | ---------- | ---- | ------------- | --- | ------- | --- | -------- | -------- | ---------------------------------------------- |
| 1 | SIMPLE | ino | | ALL | | | | | 5 | 100.00 | |
| 1 | SIMPLE | br | | ALL | idx_datetime | | | | 31865381 | 1.23 | Range checked for each record (index map: 0x2) |
Table creation:
CREATE TABLE `projectdb` (
`project` varchar(10) DEFAULT NULL,
`date` datetime DEFAULT NULL,
`enddate` datetime DEFAULT ((`date` + interval 3 day)),
`vessel` varchar(45) DEFAULT NULL,
UNIQUE KEY `project_UNIQUE` (`project`),
KEY `idx_batch` (`project`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
CREATE TABLE `milestones` (
`id` int NOT NULL AUTO_INCREMENT,
`datetime` datetime DEFAULT CURRENT_TIMESTAMP,
`sensor` varchar(20) DEFAULT NULL,
`value` decimal(10,4) DEFAULT NULL,
`monitor` tinyint DEFAULT NULL,
PRIMARY KEY (`idbioreactors`),
KEY `idx_datetime` (`datetime` DESC) USING BTREE,
KEY `idx_sensors` (`sensor`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=31649068 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
indexes:
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
| ---------- | ---------- | -------------- | ------------ | ----------- | --------- | ----------- | -------- | ------ | ---- | ---------- | ------- | ------------- | ------- | ---------- |
| projectdb | 0 | project_UNIQUE | 1 | project | A | 4 | | | YES | BTREE | | | YES | |
| projectdb | 1 | idx_project | 1 | project | A | 5 | | | YES | BTREE | | | YES | |
| milestones | 0 | PRIMARY | 1 | id | A | 31648118 | | | | BTREE | | | YES | |
| milestones | 1 | idx_datetime | 1 | datetime | D | 1723411 | | | YES | BTREE | | | YES | |
| milestones | 1 | idx_sensors | 1 | sensor | A | 21924 | | | YES | BTREE | | | YES | |
projectdb table sample:
| project | date | enddate | vessel |
| --------- | ---------------- | ---------------- | ------ |
| Project 1 | 24/11/2023 17:30 | 27/11/2023 17:30 | V1 |
| Project 2 | 17/11/2023 19:50 | 20/11/2023 19:50 | V1 |
| Project 3 | 27/10/2023 16:00 | 30/10/2023 16:00 | V2 |
milestones table sample
| id | datetime | sensor | reading |
| -------- | ---------------- | -------- | ------- |
| 22117821 | 10/10/2023 19:20 | V1_FT001 | 100 |
| 22118005 | 10/10/2023 19:21 | V2_FT001 | 120 |
| 22118189 | 10/10/2023 19:23 | V1_FT001 | 100 |
| 22117835 | 10/10/2023 19:20 | V3_FT001 | 105 |
Looking at the execution plan, it’s evident that the index on db.milestones.sensor
and db.milestones.datetime
are not being used. This means that the full table scan needs to go through all 36 million rows, which explains the poor performance. I’m assuming because this is time series data, I have 81 sensors repeating over and over, making the index less likely to be used.
Is there a way to modify this query to make it so the index is always used, or is it a matter of taking an entirely different approach if I want to improve query speeds?
I tried using a subquery in the first join to include the WHERE
clause at the end to filter out unnecessary data before joining. This approach ended up being the worst performing taking up to 2 minutes.
I tried to create a view of the db.milestones table to contain the substring which the WHERE
clause is being compared against, so I could use =
instead of LIKE
. This also did not go as expected, the performance was very bad as well.
I tried to use FORCE INDEX
to use the index but this didn’t work.
Interestingly, if I reduce the conditions in WHERE project IN...
, the index on db.milestone.sensor
is used and the processing time is much better.
2
Answers
Currently, you have two indices on the
project
column, one of which is unique, and there is no PK. You should drop the two existing indices and add PK onproject
.The key to reasonable performance is the composite key on
(sensor, datetime)
and a concrete relationship betweenvessel
andsensor
.Adding a simple join table (
vessel_sensor
) would make the world of difference:And the composite index:
Then add the join table to your existing query:
These indexes may help — either with your query formulation or with user1191247’s.
There is no
PRIMARY KEY
for the tableproject
. Perhaps you could makeproject
DEFAULT NOT NULL
and make it the pK? Note that a PK is a UNIQUE is a KEY, so your index onproject
is redundant.