skip to Main Content

I am using mariadb to store a large number of measurements. A common query I do selects for a data range. It takes a little long, something like 10 to 20 seconds, and it gets worse and worse, so I was playing around with my indices. I use EXPLAIN to see if the indices work.

But now EXPLAIN gives me a confusing output. It seems that what it does depends on the actual date range I select. For July it does something reasonable, for June not. I see it also in the query times: July is much faster. What could be the cause?

That is the output. I did not change anything else between these to queries. Only the date range. But somehow, ‘type’ is different, ‘key’ is different, ‘key_len’ is different, and most notably, it scans all 16 mio entries of the table.

EXPLAIN SELECT * FROM eurostat_dump WHERE station_name='Izana' AND last_update BETWEEN '2024-07-01' AND '2024-07-30';
+------+-------------+---------------+-------+--------------------------+-------------+---------+------+---------+-------------+
| id   | select_type | table         | type  | possible_keys            | key         | key_len | ref  | rows    | Extra       |
+------+-------------+---------------+-------+--------------------------+-------------+---------+------+---------+-------------+
|    1 | SIMPLE      | eurostat_dump | range | last_update,station_name | last_update | 3075    | NULL | 1972882 | Using where |
+------+-------------+---------------+-------+--------------------------+-------------+---------+------+---------+-------------+
MariaDB [slr_stats]> EXPLAIN SELECT * FROM eurostat_dump WHERE station_name='Izana' AND last_update BETWEEN '2024-06-01' AND '2024-06-30';
+------+-------------+---------------+------+--------------------------+------+---------+------+----------+-------------+
| id   | select_type | table         | type | possible_keys            | key  | key_len | ref  | rows     | Extra       |
+------+-------------+---------------+------+--------------------------+------+---------+------+----------+-------------+
|    1 | SIMPLE      | eurostat_dump | ALL  | last_update,station_name | NULL | NULL    | NULL | 16446501 | Using where |
+------+-------------+---------------+------+--------------------------+------+---------+------+----------+-------------+

EDIT:

This is the table layout:

| eurostat_dump | CREATE TABLE `eurostat_dump` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `station_name` text DEFAULT NULL,
  `last_update` text DEFAULT NULL,
  `target_name` text DEFAULT NULL,
  `status_code` text DEFAULT NULL,
  `returns` int(11) DEFAULT NULL,
  `prediction` text DEFAULT NULL,
  PRIMARY KEY (`ID`),
  KEY `last_update` (`last_update`(768)),
  KEY `station_name` (`station_name`(768))
) ENGINE=InnoDB AUTO_INCREMENT=17848007 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci |

2

Answers


  1. Chosen as BEST ANSWER

    Okay, so thank you all for your very useful comments, and bothering with this newbie problem!

    I did two things now which drastically improved the performance:

    • I changed the type for station_name and last_update from TEXT to TINYTEXT. (for last update maybe it would be even better to use a datetime type, but that would have some implication on the code I would need to check first)

    • I added a composite index on station_name and last_update. The thing I really learned here is that having two indices for the things you look for is no substitute for having a composite index on the two. (it does kind of make sense once you know it).

    That brought down the average read time from 20 seconds to 2 seconds. For reference, that's how the table looks now:

    eurostat_dump | CREATE TABLE `eurostat_dump` (
      `ID` int(11) NOT NULL AUTO_INCREMENT,
      `station_name` tinytext DEFAULT NULL,
      `last_update` tinytext DEFAULT NULL,
      `target_name` text DEFAULT NULL,
      `status_code` text DEFAULT NULL,
      `returns` int(11) DEFAULT NULL,
      `prediction` text DEFAULT NULL,
      PRIMARY KEY (`ID`),
      KEY `last_update` (`last_update`(255)),
      KEY `station_name` (`station_name`(255)),
      KEY `composite_1` (`station_name`(255),`last_update`(255))
    ) ENGINE=InnoDB AUTO_INCREMENT=17889062 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci |
    

    • If last_update is a date, use DATE or DATETIME or DATETIME(6). Saves space and avoids prefixing issues.
    • Never use TINYTEXT — it is worse than the equivalent VARCHAR(255).
    • When you have INDEX(a,b), don’t also have INDEX(a).
    • Normalize station_name and target_name to an INT of the smallest practical size.

    Most of the above save space, hence improving speed.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search