I am using mariadb to store a large number of measurements. A common query I do selects for a data range. It takes a little long, something like 10 to 20 seconds, and it gets worse and worse, so I was playing around with my indices. I use EXPLAIN to see if the indices work.
But now EXPLAIN gives me a confusing output. It seems that what it does depends on the actual date range I select. For July it does something reasonable, for June not. I see it also in the query times: July is much faster. What could be the cause?
That is the output. I did not change anything else between these to queries. Only the date range. But somehow, ‘type’ is different, ‘key’ is different, ‘key_len’ is different, and most notably, it scans all 16 mio entries of the table.
EXPLAIN SELECT * FROM eurostat_dump WHERE station_name='Izana' AND last_update BETWEEN '2024-07-01' AND '2024-07-30';
+------+-------------+---------------+-------+--------------------------+-------------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+---------------+-------+--------------------------+-------------+---------+------+---------+-------------+
| 1 | SIMPLE | eurostat_dump | range | last_update,station_name | last_update | 3075 | NULL | 1972882 | Using where |
+------+-------------+---------------+-------+--------------------------+-------------+---------+------+---------+-------------+
MariaDB [slr_stats]> EXPLAIN SELECT * FROM eurostat_dump WHERE station_name='Izana' AND last_update BETWEEN '2024-06-01' AND '2024-06-30';
+------+-------------+---------------+------+--------------------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+---------------+------+--------------------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | eurostat_dump | ALL | last_update,station_name | NULL | NULL | NULL | 16446501 | Using where |
+------+-------------+---------------+------+--------------------------+------+---------+------+----------+-------------+
EDIT:
This is the table layout:
| eurostat_dump | CREATE TABLE `eurostat_dump` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`station_name` text DEFAULT NULL,
`last_update` text DEFAULT NULL,
`target_name` text DEFAULT NULL,
`status_code` text DEFAULT NULL,
`returns` int(11) DEFAULT NULL,
`prediction` text DEFAULT NULL,
PRIMARY KEY (`ID`),
KEY `last_update` (`last_update`(768)),
KEY `station_name` (`station_name`(768))
) ENGINE=InnoDB AUTO_INCREMENT=17848007 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci |
2
Answers
Okay, so thank you all for your very useful comments, and bothering with this newbie problem!
I did two things now which drastically improved the performance:
I changed the type for station_name and last_update from TEXT to TINYTEXT. (for last update maybe it would be even better to use a datetime type, but that would have some implication on the code I would need to check first)
I added a composite index on station_name and last_update. The thing I really learned here is that having two indices for the things you look for is no substitute for having a composite index on the two. (it does kind of make sense once you know it).
That brought down the average read time from 20 seconds to 2 seconds. For reference, that's how the table looks now:
last_update
is a date, useDATE
orDATETIME
orDATETIME(6)
. Saves space and avoids prefixing issues.TINYTEXT
— it is worse than the equivalentVARCHAR(255)
.INDEX(a,b)
, don’t also haveINDEX(a)
.station_name
andtarget_name
to anINT
of the smallest practical size.Most of the above save space, hence improving speed.