When I use hexdump -C demo.ibd
to learn how mysql store data, I meet some problems. This is my demo table.
CREATE TABLE `demo` (
`id` int NOT NULL,
`age` int NOT NULL,
`name` varchar(20000) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=ascii
INSERT INTO demo(id, age, name)
VALUES
(1, 10, 'joker'),
(2, 11, 'tom'),
(3, 12, 'jerry'),
(4, 12, 'aaaa...20000 in total...a'),
(5, 16, NULL);
MySQL 8.0.32, Row Format Dynamic.
The following is what I saw based on hexdump
, I split the binary data for analysis.
05 # lengths of variable-length columns
00 # a bit vector for indicating NULL columns
00 00 10 00 21 # 5-byte record header
80 00 00 01 # id: 1 [1]
00 00 00 00 05 1c # transaction_id
82 00 00 01 0a 01 10 # roll_pointer
80 00 00 0a # age: 10
6a 6f 6b 65 72 # name: "joker"
03 # lengths of variable-length columns
00 # a bit vector for indicating NULL columns
00 00 18 00 1f # 5-byte record header
80 00 00 02 # id: 2
00 00 00 00 05 1e # transaction_id
82 00 00 01 17 01 10 # roll_pointer
80 00 00 0b # age: 11
74 6f 6d # name: "tom"
05 # lengths of variable-length columns
00 # a bit vector for indicating NULL columns
00 00 20 00 20 # 5-byte record header
80 00 00 03 # id: 3
00 00 00 00 05 23 # transaction_id
81 00 00 01 10 01 10 # roll_pointer
80 00 00 0c # age: 12
6a 65 72 72 79 # name: jerry
# lengths of variable-length columns [2]
01 # a bit vector for indicating NULL columns
00 00 28 ff 91 # 5-byte record header
80 00 00 04 # id: 4
00 00 00 00 05 25 # transaction_id
81 00 00 01 11 01 10 # roll_pointer
80 00 00 0c # age: 12
# name: NULL
14 c0 # lengths of variable-length columns [3]
00 # a bit vector for indicating NULL columns
00 00 30 ff 74 # 5-byte record header
80 00 00 05 # id: 5
00 00 00 00 05 2d # transaction_id
82 00 00 01 10 01 10 # roll_pointer
80 00 00 10 # age: 16
00 00 00 02 00 00 00 05 # This is a 20-byte pointer which point to overflow page
00 00 00 01 00 00 00 00 # name: "aaa..." which length is 24000 Bytes
00 00 5d c0 # 0x5dc0 = 24000
I have three questions, at line 4, line 28 and line 37.
[1] Why does MySQL not use two’s complement to store signed integers? [2] Is there no extra byte for the length when the row’s variable-length fields are all NULL? [3] When a variable-length field is too long and cause page overflow, what the extra bytes indicate?I would really appraciate if you can explain for me.
2
Answers
I don’t know. But I can guess that it’s because MySQL was ported to many architectures, and they thought it was too difficult to implement two’s complement in a way that could detect overflows on multiple architectures. Or perhaps they wanted an integer storage format independent of architecture. These are just my guesses.
InnoDB does not store fields that are NULL. The row has a bitfield of which fields are NULL or non-NULL, and it stores nothing for the fields that are NULL.
Read https://dev.mysql.com/doc/refman/8.0/en/innodb-row-format.html, especially the paragraph that contains this passage:
If the string field is too long for the page, it stores a 20-byte pointer to the first "overflow" page. InnoDB uses a linked list of pages to store very long content.
In the same manual page, it says:
[4] "lengths of variable-length columns" — How does it parse the information if this field without first knowing how many columns and whether to use 1-byte or 2-byte lengths?
A guess: there is header info before those lengths. Back up a few bytes and check for such.
[1] Why does MySQL not use two’s complement to store signed integers? —
A guess (on top of what Bill said)… Flipping the top bit of a
SIGNED
2’s-compliment number would [I think] provide a bit pattern that would give the right answer for any comparison of SIGNED vs SIGNED or UNSIGNED vs UNSIGNED.To test that theory, try some
UNSIGNED
quantities.Other things to experiment with:
ROW_FORMAT
— Compact, Redundant, Dynamic, Compressed — These are stored differently.I look forward to your future experiments.