skip to Main Content

Description

To test our updates, we compare an old, updated database schema with a fresh installation. The two schemas should be identical.

We use SHOW CREATE TABLE to find the differences. However, we noticed that with MySQL 8 we find some differences based on how the tables were constructed. Some columns include CHARACTER SET utf8mb4, while others do not.

How to reproduce:

CREATE TABLE t1 (
    col1 VARCHAR(1024) NOT NULL,
    col2 VARCHAR(1024) COLLATE utf8mb4_unicode_ci NOT NULL
);
SHOW CREATE TABLE t1;

and the output:

CREATE TABLE `t1` (
  `col1` varchar(1024) COLLATE utf8mb4_unicode_ci NOT NULL,
  `col2` varchar(1024) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

I tried to figure out what’s different about the two columns, but they look absolutely identical:

SELECT * FROM information_schema.`COLUMNS`
    WHERE table_schema = "db" AND table_name = "t1";

                          col1                               col2
                          -----                              ----
           TABLE_CATALOG: def                                def 
            TABLE_SCHEMA: db                                 db
              TABLE_NAME: t1                                 t1
             COLUMN_NAME: col1                               col2
        ORDINAL_POSITION: 1                                  2
          COLUMN_DEFAULT:
             IS_NULLABLE: NO                                 NO
               DATA_TYPE: varchar                            varchar
CHARACTER_MAXIMUM_LENGTH: 1024                               1024
  CHARACTER_OCTET_LENGTH: 4096                               4096
       NUMERIC_PRECISION: NULL                               NULL
           NUMERIC_SCALE: NULL                               NULL 
      DATETIME_PRECISION: NULL                               NULL
      CHARACTER_SET_NAME: utf8mb4                            utf8mb4
          COLLATION_NAME: utf8mb4_unicode_ci                 utf8mb4_unicode_ci
             COLUMN_TYPE: varchar(1024)                       varchar(1024)
              COLUMN_KEY:
                   EXTRA:
              PRIVILEGES: select,insert,update,references    select,insert,update,references
          COLUMN_COMMENT:
   GENERATION_EXPRESSION:
                  SRS_ID: NULL                               NULL

Question

How do those columns differ? They both use the character set utf8mb4 and the collation utf8mb4_unicode_ci. Is it just a flag somewhere? And what would that flag do?

2

Answers


  1. Chosen as BEST ANSWER

    MySQL 8.0.11 changed the behavior of SHOW CREATE TABLE. It now shows the collation and character set if they have been explicitly specified. (https://bugs.mysql.com/bug.php?id=46239)

    For this, each column remembers if it has been explicitly assigned a collation or character set. The relevant code can be found in store_create_info:

    /*
      For string types dump charset name only if field charset is same as
      table charset or was explicitly assigned.
     */
    if (field->charset() != share->table_charset ||
        column_has_explicit_collation) {
        packet->append(STRING_WITH_LEN(" CHARACTER SET "));
        packet->append(field->charset()->csname);
    }
    

    (NB: I believe the comment is wrong.)

    The columns are exactly the same otherwise. I believe this is done to make the code created by SHOW CREATE TABLE portable to databases that use other default charsets and collations.


  2. You should read MySQL’s excellent documentation which explains the difference between character sets and collations. Character sets determine which characters can be stored in a column or table, while collations determine how characters are compared against each other. Here again is your column definition:

    col1 VARCHAR(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT ''
    

    This means that utf8mb4 characters can be stored in col1. However, it provides utf8mb4_unicode_ci as the collation, which is case insensitive. This means that comparing e.g. m and M would result in both characters being the same for sorting and other operations.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search