Mariadb / mysql EXPLAIN to optimize data query

DanielHampf
July 18, 2024
237 views
0 votes
2 Answers

I am using mariadb to store a large number of measurements. A common query I do selects for a data range. It takes a little long, something like 10 to 20 seconds, and it gets worse and worse, so I was playing around with my indices. I use EXPLAIN to see if the indices work.

But now EXPLAIN gives me a confusing output. It seems that what it does depends on the actual date range I select. For July it does something reasonable, for June not. I see it also in the query times: July is much faster. What could be the cause?

That is the output. I did not change anything else between these to queries. Only the date range. But somehow, ‘type’ is different, ‘key’ is different, ‘key_len’ is different, and most notably, it scans all 16 mio entries of the table.

EXPLAIN SELECT * FROM eurostat_dump WHERE station_name='Izana' AND last_update BETWEEN '2024-07-01' AND '2024-07-30';
+------+-------------+---------------+-------+--------------------------+-------------+---------+------+---------+-------------+
| id   | select_type | table         | type  | possible_keys            | key         | key_len | ref  | rows    | Extra       |
+------+-------------+---------------+-------+--------------------------+-------------+---------+------+---------+-------------+
|    1 | SIMPLE      | eurostat_dump | range | last_update,station_name | last_update | 3075    | NULL | 1972882 | Using where |
+------+-------------+---------------+-------+--------------------------+-------------+---------+------+---------+-------------+

MariaDB [slr_stats]> EXPLAIN SELECT * FROM eurostat_dump WHERE station_name='Izana' AND last_update BETWEEN '2024-06-01' AND '2024-06-30';
+------+-------------+---------------+------+--------------------------+------+---------+------+----------+-------------+
| id   | select_type | table         | type | possible_keys            | key  | key_len | ref  | rows     | Extra       |
+------+-------------+---------------+------+--------------------------+------+---------+------+----------+-------------+
|    1 | SIMPLE      | eurostat_dump | ALL  | last_update,station_name | NULL | NULL    | NULL | 16446501 | Using where |
+------+-------------+---------------+------+--------------------------+------+---------+------+----------+-------------+

EDIT:

This is the table layout:

| eurostat_dump | CREATE TABLE `eurostat_dump` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `station_name` text DEFAULT NULL,
  `last_update` text DEFAULT NULL,
  `target_name` text DEFAULT NULL,
  `status_code` text DEFAULT NULL,
  `returns` int(11) DEFAULT NULL,
  `prediction` text DEFAULT NULL,
  PRIMARY KEY (`ID`),
  KEY `last_update` (`last_update`(768)),
  KEY `station_name` (`station_name`(768))
) ENGINE=InnoDB AUTO_INCREMENT=17848007 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci |

Answers

Chosen as BEST ANSWER
- DanielHampf
- July 12, 2024 at 1:54 pm
- 0 votes
0
Okay, so thank you all for your very useful comments, and bothering with this newbie problem!

I did two things now which drastically improved the performance:
- I changed the type for station_name and last_update from TEXT to TINYTEXT. (for last update maybe it would be even better to use a datetime type, but that would have some implication on the code I would need to check first)
- I added a composite index on station_name and last_update. The thing I really learned here is that having two indices for the things you look for is no substitute for having a composite index on the two. (it does kind of make sense once you know it).
That brought down the average read time from 20 seconds to 2 seconds. For reference, that's how the table looks now:
```
eurostat_dump | CREATE TABLE `eurostat_dump` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `station_name` tinytext DEFAULT NULL,
  `last_update` tinytext DEFAULT NULL,
  `target_name` text DEFAULT NULL,
  `status_code` text DEFAULT NULL,
  `returns` int(11) DEFAULT NULL,
  `prediction` text DEFAULT NULL,
  PRIMARY KEY (`ID`),
  KEY `last_update` (`last_update`(255)),
  KEY `station_name` (`station_name`(255)),
  KEY `composite_1` (`station_name`(255),`last_update`(255))
) ENGINE=InnoDB AUTO_INCREMENT=17889062 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci |
```

(Edit)

- RickJames
- July 18, 2024 at 3:47 am
- 0 votes
0
- If last_update is a date, use DATE or DATETIME or DATETIME(6). Saves space and avoids prefixing issues.
- Never use TINYTEXT — it is worse than the equivalent VARCHAR(255).
- When you have INDEX(a,b), don’t also have INDEX(a).
- Normalize station_name and target_name to an INT of the smallest practical size.
Most of the above save space, hence improving speed.
- Use daily (or other unit) Summary Tables — This leads to the queries that are much faster.
- More on Sensor Tables
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.