Description
To test our updates, we compare an old, updated database schema with a fresh installation. The two schemas should be identical.
We use SHOW CREATE TABLE
to find the differences. However, we noticed that with MySQL 8 we find some differences based on how the tables were constructed. Some columns include CHARACTER SET utf8mb4
, while others do not.
How to reproduce:
CREATE TABLE t1 (
col1 VARCHAR(1024) NOT NULL,
col2 VARCHAR(1024) COLLATE utf8mb4_unicode_ci NOT NULL
);
SHOW CREATE TABLE t1;
and the output:
CREATE TABLE `t1` (
`col1` varchar(1024) COLLATE utf8mb4_unicode_ci NOT NULL,
`col2` varchar(1024) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
I tried to figure out what’s different about the two columns, but they look absolutely identical:
SELECT * FROM information_schema.`COLUMNS`
WHERE table_schema = "db" AND table_name = "t1";
col1 col2
----- ----
TABLE_CATALOG: def def
TABLE_SCHEMA: db db
TABLE_NAME: t1 t1
COLUMN_NAME: col1 col2
ORDINAL_POSITION: 1 2
COLUMN_DEFAULT:
IS_NULLABLE: NO NO
DATA_TYPE: varchar varchar
CHARACTER_MAXIMUM_LENGTH: 1024 1024
CHARACTER_OCTET_LENGTH: 4096 4096
NUMERIC_PRECISION: NULL NULL
NUMERIC_SCALE: NULL NULL
DATETIME_PRECISION: NULL NULL
CHARACTER_SET_NAME: utf8mb4 utf8mb4
COLLATION_NAME: utf8mb4_unicode_ci utf8mb4_unicode_ci
COLUMN_TYPE: varchar(1024) varchar(1024)
COLUMN_KEY:
EXTRA:
PRIVILEGES: select,insert,update,references select,insert,update,references
COLUMN_COMMENT:
GENERATION_EXPRESSION:
SRS_ID: NULL NULL
Question
How do those columns differ? They both use the character set utf8mb4
and the collation utf8mb4_unicode_ci
. Is it just a flag somewhere? And what would that flag do?
2
Answers
MySQL 8.0.11 changed the behavior of
SHOW CREATE TABLE
. It now shows the collation and character set if they have been explicitly specified. (https://bugs.mysql.com/bug.php?id=46239)For this, each column remembers if it has been explicitly assigned a collation or character set. The relevant code can be found in
store_create_info
:(NB: I believe the comment is wrong.)
The columns are exactly the same otherwise. I believe this is done to make the code created by
SHOW CREATE TABLE
portable to databases that use other default charsets and collations.You should read MySQL’s excellent documentation which explains the difference between character sets and collations. Character sets determine which characters can be stored in a column or table, while collations determine how characters are compared against each other. Here again is your column definition:
This means that
utf8mb4
characters can be stored incol1
. However, it providesutf8mb4_unicode_ci
as the collation, which is case insensitive. This means that comparing e.g.m
andM
would result in both characters being the same for sorting and other operations.