skip to Main Content

When I use hexdump -C demo.ibd to learn how mysql store data, I meet some problems. This is my demo table.

CREATE TABLE `demo` (
  `id`   int NOT NULL,
  `age`  int NOT NULL,
  `name` varchar(20000) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=ascii

INSERT INTO demo(id, age, name) 
VALUES
(1, 10, 'joker'),
(2, 11, 'tom'),
(3, 12, 'jerry'),
(4, 12, 'aaaa...20000 in total...a'),
(5, 16, NULL);

MySQL 8.0.32, Row Format Dynamic.

The following is what I saw based on hexdump, I split the binary data for analysis.

enter image description here

05                          # lengths of variable-length columns
00                          # a bit vector for indicating NULL columns
00 00 10 00 21              # 5-byte record header
80 00 00 01                 # id: 1 [1]
00 00 00 00 05 1c           # transaction_id
82 00 00 01 0a 01 10        # roll_pointer
80 00 00 0a                 # age: 10
6a 6f 6b 65 72              # name: "joker"

03                          # lengths of variable-length columns
00                          # a bit vector for indicating NULL columns
00 00 18 00 1f              # 5-byte record header
80 00 00 02                 # id: 2
00 00 00 00 05 1e           # transaction_id
82 00 00 01 17 01 10        # roll_pointer
80 00 00 0b                 # age: 11
74 6f 6d                    # name: "tom"

05                          # lengths of variable-length columns
00                          # a bit vector for indicating NULL columns
00 00 20 00 20              # 5-byte record header
80 00 00 03                 # id: 3
00 00 00 00 05 23           # transaction_id
81 00 00 01 10 01 10        # roll_pointer
80 00 00 0c                 # age: 12
6a 65 72 72 79              # name: jerry

                            # lengths of variable-length columns [2]
01                          # a bit vector for indicating NULL columns
00 00 28 ff 91              # 5-byte record header
80 00 00 04                 # id: 4
00 00 00 00 05 25           # transaction_id
81 00 00 01 11 01 10        # roll_pointer
80 00 00 0c                 # age: 12
                            # name: NULL

14 c0                       # lengths of variable-length columns [3]
00                          # a bit vector for indicating NULL columns
00 00 30 ff 74              # 5-byte record header
80 00 00 05                 # id: 5
00 00 00 00 05 2d           # transaction_id
82 00 00 01 10 01 10        # roll_pointer
80 00 00 10                 # age: 16
00 00 00 02 00 00 00 05     # This is a 20-byte pointer which point to overflow page
00 00 00 01 00 00 00 00     # name: "aaa..." which length is 24000 Bytes
00 00 5d c0                 # 0x5dc0 = 24000

I have three questions, at line 4, line 28 and line 37.

[1] Why does MySQL not use two’s complement to store signed integers?

[2] Is there no extra byte for the length when the row’s variable-length fields are all NULL?

[3] When a variable-length field is too long and cause page overflow, what the extra bytes indicate?

I would really appraciate if you can explain for me.

2

Answers


    1. I don’t know. But I can guess that it’s because MySQL was ported to many architectures, and they thought it was too difficult to implement two’s complement in a way that could detect overflows on multiple architectures. Or perhaps they wanted an integer storage format independent of architecture. These are just my guesses.

    2. InnoDB does not store fields that are NULL. The row has a bitfield of which fields are NULL or non-NULL, and it stores nothing for the fields that are NULL.

      Read https://dev.mysql.com/doc/refman/8.0/en/innodb-row-format.html, especially the paragraph that contains this passage:

      The variable-length part of the record header contains a bit vector for indicating NULL columns. … Columns that are NULL do not occupy space other than the bit in this vector.

    3. If the string field is too long for the page, it stores a 20-byte pointer to the first "overflow" page. InnoDB uses a linked list of pages to store very long content.

      In the same manual page, it says:

      InnoDB can store long variable-length column values (for VARCHAR, VARBINARY, and BLOB and TEXT types) fully off-page, with the clustered index record containing only a 20-byte pointer to the overflow page.

    Login or Signup to reply.
  1. [4] "lengths of variable-length columns" — How does it parse the information if this field without first knowing how many columns and whether to use 1-byte or 2-byte lengths?

    A guess: there is header info before those lengths. Back up a few bytes and check for such.

    [1] Why does MySQL not use two’s complement to store signed integers? —

    A guess (on top of what Bill said)… Flipping the top bit of a SIGNED 2’s-compliment number would [I think] provide a bit pattern that would give the right answer for any comparison of SIGNED vs SIGNED or UNSIGNED vs UNSIGNED.

    To test that theory, try some UNSIGNED quantities.

    Other things to experiment with:

    • VARCHAR with under 40 bytes, up to 8000 bytes, etc.
    • CHARACTER SET utf8mb4 with ascii characters; with Emoji.
    • ENUM with < 256 options; more than.
    • 2 VARCHAR columns; one with <= 191 and one >8K.
    • TINYTEXT. (This datatype seems completely redundant, and slower than VARCHAR(255) in some situations.)
    • A table with all columns NOT NULL. Will the null-bits go away?
    • ROW_FORMAT — Compact, Redundant, Dynamic, Compressed — These are stored differently.
    • The layout after some row (other than the last) is DELETEd.

    I look forward to your future experiments.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search