Here’s my query:
SELECT COUNT(1)
FROM nhd
WHERE gnis_id IN (01372336);
The table definition has an INDEX gnis_id (gnis_id)
on it and, indeed, that index is used with this query. BUT when I replace the COUNT(1)
with *
the index is not used, at least according to EXPLAIN
. Why is that? Why would changing the columns I’m getting back change how MySQL is deciding which INDEX to use for the WHERE part of the query?
Here’s the output of the EXPLAIN with *
:
id: 1
select_type: SIMPLE
table: nhd
partitions: NULL
type: ALL
possible_keys: gnis_id
key: NULL
key_len: NULL
ref: NULL
rows: 2752840
filtered: 10.00
Extra: Using where
Here’s the output of SHOW INDEX FROM nhd
:
I’m running MySQL 8.0.33
2
Answers
From result of your
SHOW INDEX
statement, we can see the info regarding the table conditiongnis_id
:As the cardinality suggests, the selectivity for the index i.e the cardinality/table_row_count 16206 / 2752840 = 0.0059 is quite low. And by using
select *
, which has items beyond the stored values of the index, should MySQL decide to use the index, it would have to use the clustered key at the end of each index row to refer to the main table.With the aboved mentioned two issues:low selectivity and has to use clustered index to refer to main table for missing information, MySQL’s optimiser believes it’d better use a FULL TABLE scan (which is based upon the clustered index) instead, as it find the latter one worse in the trade-off between the (FULL TABLE scan) and (extra reference work plus low selectivity).
Here is a couple of things you can try:
First, use
select gnis_id
instead ofselect *
,this will have MySQL use the index as the index covers the value of gnis_id itself, which means no extra reference work is necessary (the same applies toselect count(1)
). You will find the access type to beref
in the excution plan, which makes the query response time significantly (hopefully) better than the access typeALL
which is caused byselect *
.Secondly, use the
select *
clause but at the end of the from clause addforce index(gnis_id)
to force MySQL to use the index.Find how long it takes to execute it and compare how long it takes to execute the one without forcing the index. You will see why MySQL deems it unnecessary to use the index.
A guess. Is this the declaration of the column?
Meanwhile, you are checking those strings against a number:
That means it has to convert each
gnis_id
to a number before testing.Solution:
With that change, it will look at only 96 rows, not 2,758,717 rows. (Actually, it will be 97 rows — the extra one is to discover that it is finished.)
Discussion
SELECT COUNT(1)
can be done entirely in theINDEX's
BTree. That is the index is "covering";EXPLAIN
will say "Using index". And, because the index is ordered bygnis_id
, the rows it needs to look at are consecutive once the datatypes match.SELECT *
needs to fetch all the columns, which are found only in the data’s BTree.The clustered index on
id
does not need to be involved.