When I select all columns, the indexing does not work, but if I select just a few columns, it does.
mysql> explain SELECT * FROM fatura USE INDEX (datapago_serv_pago) WHERE id_servidor = 10 AND pago = '1' AND data_pago <= '2021-05-05';
+----+-------------+--------+------------+------+--------------------+------+---------+------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+--------------------+------+---------+------+----------+----------+-------------+
| 1 | SIMPLE | fatura | NULL | ALL | datapago_serv_pago | NULL | NULL | NULL | 10199216 | 0.00 | Using where |
+----+-------------+--------+------------+------+--------------------+------+---------+------+----------+----------+-------------+
mysql> explain SELECT uid_ FROM fatura USE INDEX (datapago_serv_pago) WHERE id_servidor = 10 AND pago = '1' AND data_pago <= '2021-05-05';
+----+-------------+--------+------------+-------+--------------------+--------------------+---------+------+---------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+--------------------+--------------------+---------+------+---------+----------+--------------------------+
| 1 | SIMPLE | fatura | NULL | range | datapago_serv_pago | datapago_serv_pago | 5 | NULL | 5099608 | 0.01 | Using where; Using index |
+----+-------------+--------+------------+-------+--------------------+--------------------+---------+------+---------+----------+--------------------------+
CREATE TABLE `fatura` (
`uid_` bigint unsigned NOT NULL,
`uid_cliente` bigint unsigned NOT NULL,
`uid_cliente_servico` bigint unsigned NOT NULL,
`id_servidor` tinyint unsigned NOT NULL,
`id` int unsigned NOT NULL DEFAULT '0',
`data_cadastro` date NOT NULL,
`valor` decimal(12,2) NOT NULL DEFAULT '0.00',
`vencimento` date NOT NULL DEFAULT (0),
`pago` tinyint unsigned NOT NULL DEFAULT '0',
`data_pago` date NOT NULL DEFAULT (0),
`valor_pago` decimal(12,2) unsigned NOT NULL DEFAULT '0.00',
`historico` varchar(500) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL DEFAULT '',
`id_cliente` int unsigned NOT NULL DEFAULT (0),
`id_servico` int unsigned NOT NULL DEFAULT (0),
`nome` varchar(500) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci NOT NULL DEFAULT '',
`desativada` tinyint unsigned NOT NULL DEFAULT '0',
`operador_inclusao` varchar(50) NOT NULL,
`operador_liquidacao` varchar(50) NOT NULL,
`forma_pago` varchar(50) NOT NULL,
`status_banco` tinyint NOT NULL DEFAULT (0),
PRIMARY KEY (`uid_`),
KEY `vencimento` (`vencimento`) USING BTREE,
KEY `id_cliente` (`id_cliente`),
KEY `id_servico` (`id_servico`),
KEY `pago` (`pago`),
KEY `id_servidor` (`id_servidor`),
KEY `id` (`id`) USING BTREE,
KEY `uid_cliente` (`uid_cliente`),
KEY `data_pago` (`data_pago`),
KEY `data_cadastro` (`data_cadastro`),
KEY `historico` (`historico`),
KEY `uid_cliente_servico` (`uid_cliente_servico`),
KEY `desativada` (`desativada`),
KEY `operador_inclusao` (`operador_inclusao`),
KEY `operador_liquidacao` (`operador_liquidacao`),
KEY `venc_serv_pago` (`vencimento`,`id_servidor`,`pago`),
KEY `forma_pago` (`forma_pago`),
KEY `status_banco` (`status_banco`),
KEY `datapago_serv_pago` (`data_pago`,`id_servidor`,`pago`),
KEY `vencimento_serv_pago` (`data_pago`,`id_servidor`,`pago`)
)
If I select a range of dates, it works normally:
explain SELECT * FROM fatura USE INDEX (datapago_serv_pago) WHERE id_servidor = 10 AND pago = '1' AND data_pago >= '2021-04-01' AND data_pago <= '2021-05-05';
+----+-------------+--------+------------+-------+--------------------+--------------------+---------+------+--------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+-------+--------------------+--------------------+---------+------+--------+----------+-----------------------+
| 1 | SIMPLE | fatura | NULL | range | datapago_serv_pago | datapago_serv_pago | 5 | NULL | 158342 | 0.01 | Using index condition |
+----+-------------+--------+------------+-------+--------------------+--------------------+---------+------+--------+----------+-----------------------+
I’ve searched a lot throughout the documentation but I couldn’t find the reason or solution, does anyone know?
2
Answers
I think the reason the second query uses the index is because you’re only selecting the primary key column. While the
uid_
column isn’t explicitly in the index, all indexes effectively contain the primary key as the reference to the corresponding table rows.This means that it can satisfy the entire query from the index, without having to go to the actual table data, so it does so. When you use
SELECT *
it has to go to the table data. Then for some reason it decides that there’s not enough benefit to using the index to search for the rows, compared to a table scan. Notice that the number of rows selected is about half the table, which is not much benefit. If it’s going to have to fetch the full row data from half the table, it might as well just scan it all and collect the data as it goes.In the last query, the more specific range of dates narrows the number of rows even more, to only about 1% of the table, so using the index is much better even when fetching full row data.
MySQL won’t use an index if the cost-based optimizer estimates that it would be more work to use the index than to skip the index and just read all the rows.
Keep in mind that when making a secondary index lookup in InnoDB, it’s really two lookups. One to find an entry in the index, and also a primary key lookup to get the other columns of the table that are not part of the index.
So depending on how much of the table is matched by the search, it might be more costly to use the index. In my experience, if the query condition matches more than 20% of the table, the optimizer usually chooses not to use that index. I don’t think that’s an official or documented threshold, it’s just based on observation.
So I’d guess your condition on the first column of the index:
matches 20% or more of the table.
And when your conditions were for a more narrow range of dates:
this persuaded the optimizer that it would match a lesser portion of the table, so it was worth using the index.
The column referenced in this range condition,
data_pago
, is the first column of the index, which means the subsequent columns of that index won’t be used for the search. Basically, you can have several columns in the index help a search if and only if the leftmost columns are all used in equality conditions. Once you have one column used in any kind of inequality or range condition, that’s the last column of the index that will be used for searching or sorting.For example in your case, the index is on
(data_pago,id_servidor,pago)
, but the conditions aredata_pago <= '2021-05-05' AND id_servidor = 10 AND pago = '1'
(rearranging these terms to match the order of columns of the index). The first of these columns is a range condition, so the other two columns can’t be used.If you had defined the index with
data_pago
in the third position, then all three columns could be used, and that might narrow down the search better, so it matches substantially less than 20% of the table.These rules change when you select only the single column, instead of
SELECT *
. Then the optimizer reasons that it’s an index-only query, and it doesn’t bear the extra cost of reading the table rows. This means the columns it needs to fetch are all from the index itself (the primary key column is always part of that index in InnoDB). So it changes its mind, and uses the index.Note the
filtered: 0.01
reported by the second EXPLAIN. This means of the rows it examines, it will discard 99.99% of them by applying non-indexed conditions. This is poor optimization, because it means it must examine a lot of rows only to discard them. Ideal optimization would befiltered: 100.0
, which would indicate every row it examined through index lookups are in fact rows that belong in the result set.Apparently when doing a table-scan (
type: ALL
) thefiltered
column isn’t calculated, so it reports 0.0. This is not mentioned in the documentation.