skip to Main Content

I have been moving A DATABASE from MySQL 5.0 (Fedora Core 7) to MariaDB 10.5 into recent OS Rocky 9.1. As far as ASCII 1 byte text data go, no problem. Now, I have plenty of 2-bytes characters in my certain Database. Russian, Hangul, Japanese, Chinese letters. I wasn’t successful to compile the Data into MariaDB 10.5. Can I tell MariaDB while I am compiling the Data, please handle the Data contains the 2-bytes Character. Or I have to adjust the computer’s environmental settings to accommodate 2-bytes letters.

Anyway I post the my .sql code. Would you take a look at?

DROP TABLE IF EXISTS editors;
  SET @saved_cs_client = @@character_set_client;
  SET character_set_client = utf8;
CREATE TABLE editors (
  id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
  editor_id       SMALLINT UNSIGNED,  
  language        CHAR(100),   
  name            CHAR(250), 
  dummy_field     CHAR(10),  
  PRIMARY KEY(id)              
);

INSERT INTO editors  ( editor_id, language, name, dummy_field ) VALUES ('1', 'cn:', 'Gengjie Yang', 'ZZZ' );
INSERT INTO editors  ( editor_id, language, name, dummy_field ) VALUES ('2', 'de:', 'Herb Wirtz', 'ZZZ' );
INSERT INTO editors  ( editor_id, language, name, dummy_field ) VALUES ('3', 'en:', 'Diane Gegala', 'ZZZ' );
INSERT INTO editors  ( editor_id, language, name, dummy_field ) VALUES ('4', 'en:', 'Maurice Barnfather', 'ZZZ' );
INSERT INTO editors  ( editor_id, language, name, dummy_field ) VALUES ('5', 'en:', 'Florence Smith', 'ZZZ' );
INSERT INTO editors  ( editor_id, language, name, dummy_field ) VALUES ('6', 'es:', 'Felix Sutherland', 'ZZZ' );
INSERT INTO editors  ( editor_id, language, name, dummy_field ) VALUES ('7', 'es:', 'Enrique Cob', 'ZZZ' );
INSERT INTO editors  ( editor_id, language, name, dummy_field ) VALUES ('8', 'fr:', 'Gina Loyonnet', 'ZZZ' );
INSERT INTO editors  ( editor_id, language, name, dummy_field ) VALUES ('9', 'it:', 'Dr. John Termini : Daniela Castanotto', 'ZZZ' );
INSERT INTO editors  ( editor_id, language, name, dummy_field ) VALUES ('10', 'kr:', '수김선생 (Sue B. Kim)', 'ZZZ' );
INSERT INTO editors  ( editor_id, language, name, dummy_field ) VALUES ('11', 'pt:', 'Dr. Lilicca Moreira', 'ZZZ' );
INSERT INTO editors  ( editor_id, language, name, dummy_field ) VALUES ('12', 'ru:', 'Тимур Мукминов (Timur Mukminov)', 'ZZZ' );
INSERT INTO editors  ( editor_id, language, name, dummy_field ) VALUES ('13', 'ru:', 'Даша Авдулова (Dasha Avdulova)', 'ZZZ' );
INSERT INTO editors  ( editor_id, language, name, dummy_field ) VALUES ('14', 'it:', 'Grazia Manzone', 'ZZZ' );
INSERT INTO editors  ( editor_id, language, name, dummy_field ) VALUES ('15', 'de:', 'Herb Bauer', 'ZZZ' );
INSERT INTO editors  ( editor_id, language, name, dummy_field ) VALUES ('16', 'en:', 'Bill Reardon', 'ZZZ' );
INSERT INTO editors  ( editor_id, language, name, dummy_field ) VALUES ('17', 'es:', 'Mario Araujo, Prs.', 'ZZZ' );
 

Then, I got a complaining message from MariaDB stating that the line 22, 24, and 25 caused errors. While the rest of lines are INSERTed normally.

Error Message during Insertion MariaDB

Result Review by after Insertion

It is obvious that errors occurred only at Hangul Letter line 22 and Russia Letter lines 24 and 25.

I don’t know how to tell or configure MariaDB 10.5 in order to accept 2-bytes letter smoothly? Would you suggest any possible way to fix this error?

2

Answers


  1. Try to create table in this way:

    CREATE TABLE editors (
     id MEDIUMINT UNSIGNED NOT NULL AUTO_INCREMENT,
     editor_id       SMALLINT UNSIGNED,  
     language        CHAR(100),   
     name            CHAR(250), 
     dummy_field     CHAR(10),  
     PRIMARY KEY(id)              
    )
    CHARACTER SET 'utf8mb4' 
    COLLATE 'utf8mb4_czech_ci';
    

    and then try to

    INSERT INTO editors  ( editor_id, language, name, dummy_field ) VALUES ('12', 'ru:', 'Тимур Мукминов (Timur Mukminov)', 'ZZZ' );
    

    https://mariadb.com/kb/en/setting-character-sets-and-collations/

    Login or Signup to reply.
    • Not just 2-byte characters for Europe, but also 3-and even 4-byte characters for Asia.
    • These 6 bytes (shown in hex) EC8898EAB980, for example, represent 2 Korean characters 수김.
    • Chinese needs utf8mb4, so don’t stop at utf8 (aka utf8mb3).
    • If you have those bytes sitting in a column declared to be CHARACTER SET latin1, then you have the "Mojibake" problem.
    • See Trouble with UTF-8 characters; what I see is not what I stored

    One of these may work; try each carefully; there’s no undo.

    CHARACTER SET latin1, but have utf8 bytes in it; leave bytes alone while fixing charset:
    First, lets assume you have this declaration for tbl.col:

    col VARCHAR(111) CHARACTER SET latin1 NOT NULL
    

    Then to convert the column without changing the bytes:

    ALTER TABLE tbl MODIFY COLUMN col VARBINARY(111) NOT NULL;
    ALTER TABLE tbl MODIFY COLUMN col VARCHAR(111) CHARACTER SET utf8mb4 NOT NULL;
    

    Note: If you start with TEXT, use BLOB as the intermediate definition. (This is the "2-step ALTER", as discussed elsewhere.) (Be sure to keep the other specifications the same – VARCHAR, NOT NULL, etc.)

    CHARACTER SET utf8mb4 with Mojibake:

    UPDATE tbl SET col = CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8mb4);
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search