skip to Main Content

Here’s my query:

SELECT COUNT(1)
FROM nhd
WHERE gnis_id IN (01372336);

The table definition has an INDEX gnis_id (gnis_id) on it and, indeed, that index is used with this query. BUT when I replace the COUNT(1) with * the index is not used, at least according to EXPLAIN. Why is that? Why would changing the columns I’m getting back change how MySQL is deciding which INDEX to use for the WHERE part of the query?

Here’s the output of the EXPLAIN with *:

id: 1
select_type: SIMPLE
table: nhd
partitions: NULL
type: ALL
possible_keys: gnis_id
key: NULL
key_len: NULL
ref: NULL
rows: 2752840
filtered: 10.00
Extra: Using where

Here’s the output of SHOW INDEX FROM nhd:

https://pastebin.com/KXpqBeJm

I’m running MySQL 8.0.33

2

Answers


  1. From result of your SHOW INDEX statement, we can see the info regarding the table condition gnis_id:

    Table: nhd
    Non_unique: 1
    Key_name: gnis_id
    Seq_in_index: 1
    Column_name: gnis_id
    Collation: A
    Cardinality: 16206
    Sub_part: NULL
    Packed: NULL
    Null:
    Index_type: BTREE
    Comment:
    Index_comment:
    Visible: YES
    Expression: NULL
    
    

    As the cardinality suggests, the selectivity for the index i.e the cardinality/table_row_count 16206 / 2752840 = 0.0059 is quite low. And by using select *, which has items beyond the stored values of the index, should MySQL decide to use the index, it would have to use the clustered key at the end of each index row to refer to the main table.
    With the aboved mentioned two issues:low selectivity and has to use clustered index to refer to main table for missing information, MySQL’s optimiser believes it’d better use a FULL TABLE scan (which is based upon the clustered index) instead, as it find the latter one worse in the trade-off between the (FULL TABLE scan) and (extra reference work plus low selectivity).
    Here is a couple of things you can try:

    First, use select gnis_id instead of select * ,this will have MySQL use the index as the index covers the value of gnis_id itself, which means no extra reference work is necessary (the same applies to select count(1)). You will find the access type to be ref in the excution plan, which makes the query response time significantly (hopefully) better than the access type ALL which is caused by select *.

    Secondly, use the select * clause but at the end of the from clause add force index(gnis_id) to force MySQL to use the index.

    SELECT *
    FROM nhd force index(gnis_id)
    WHERE gnis_id IN (01372336);
    

    Find how long it takes to execute it and compare how long it takes to execute the one without forcing the index. You will see why MySQL deems it unnecessary to use the index.

    Login or Signup to reply.
  2. A guess. Is this the declaration of the column?

    gnis_id VARCHAR(8)
    

    Meanwhile, you are checking those strings against a number:

    WHERE gnis_id IN (01372336)
    

    That means it has to convert each gnis_id to a number before testing.

    Solution:

    WHERE gnis_id IN ("01372336")
    

    With that change, it will look at only 96 rows, not 2,758,717 rows. (Actually, it will be 97 rows — the extra one is to discover that it is finished.)

    Discussion

    SELECT COUNT(1) can be done entirely in the INDEX's BTree. That is the index is "covering"; EXPLAIN will say "Using index". And, because the index is ordered by gnis_id, the rows it needs to look at are consecutive once the datatypes match.

    SELECT * needs to fetch all the columns, which are found only in the data’s BTree.

    The clustered index on id does not need to be involved.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search