MySQL official documentation 8.8.5 Estimating Query Performance offers the way to count disk seeks:
log(row_count) / log(index_block_length / 3 * 2 / (index_length + data_pointer_length)) + 1
In MySQL, an index block is usually 1,024 bytes and the data pointer is usually four bytes. For a 500,000-row table with a key value length of three bytes (the size of MEDIUMINT), the formula indicates log(500,000)/log(1024/3*2/(3+4)) + 1 = 4 seeks.
Innodb index is implemented by B-Tree, and I understand the formula. But there are two key points I cannot catch on in the sentence In MySQL, an index block is usually 1,024 bytes and the data pointer is usually four bytes.
My questions are:
- Innodb is index-organized table, so index block should equal to page
block size which is default to 16k innodb_page_size. I googled
index_block_length, but nothing found, so why it says usually 1,024 bytes here? Could someone list the documentation or source
code which explains index_block_length? - On 64bit version MySQL, is data pointer 8 bytes? Could someone list
the documentation or source code which explains the length of data
pointer ?
2
Answers
First, please note that the calculation you are looking at is for MyISAM, not for InnoDB.
The documentation has a description of the MyISAM index file format, including references to the source code.
"index_block_length" refers to the block size:
It is specified by the system configuration variable
myisam-block-size
, defaulting to the mentioned 1024 bytes:The index entries look like this:
where "data_pointer_length" is the length of a "key pointer". On disk, so it has has nothing to do with using 32- or 64-bit memory pointers.
The size is set by the
myisam_data_pointer_size
configuration variable. You may notice that the default size is 6 nowaways, not 4. And with "nowaydays" I mean since MySQL 5.0.6, released in 2005. Which might give you an idea why the documentation you looked at didn’t mention they are not actually talking about InnoDB.I use the following guidelines (for InnoDB):
innodb_buffer_pool_size
will (after sufficient "warm up") have everything cached in RAM. Hence there will be no disk reads forSELECTs
or the "read" part of write operations.PRIMARY KEY, or secondary indexes) can be [crudely] estimated at 100. That is, for each node in the
B+Tree, there will be about 100 child nodes (or, in case of the bottom level, rows).