I come back with utf8mb4 character issues. I’m using mariadb Server version: 10.3.37-MariaDB-0ubuntu0.20.04.1 Ubuntu 20.04 on a local machine with Linux Mint 20.3.
Here is a test table:
CREATE TABLE chartable (
k1 VARCHAR(5)
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
INSERT INTO chartable (k1) values
('食𣅀C' ),
('食B𡃀'),
('亰BC');
Now the issues:
MariaDB [test]> select * from chartable where k1 like '%𣅀%';
+----------+
| k1 |
+----------+
| 食𣅀C |
| 食B𡃀 |
+----------+
2 rows in set (0,008 sec)
MariaDB [test]> select hex('𣅀');
+----------+
| hex('?') |
+----------+
| F0A38580 |
+----------+
1 row in set (0,001 sec)
I notice the first query doesn’t return what I expected, which is 食𣅀C
Second query displays a ? instead of 𣅀
I ensured that character set and collations are consistent, database and table have the same character set and collation, utf8mb4_general_ci as I was told in this thread https://stackoverflow.com/questions/74975518/false-duplicate-with-mariadb-and-chinese-characters/74975829#74975829
MariaDB [nihongo]> show variables like 'character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
and:
MariaDB [nihongo]> show variables like 'collation%';
+----------------------+--------------------+
| Variable_name | Value |
+----------------------+--------------------+
| collation_connection | utf8mb4_general_ci |
| collation_database | utf8mb4_general_ci |
| collation_server | utf8mb4_general_ci |
+----------------------+--------------------+
I’ve tested with others tables and got the same type of issues.
Could you explain me what I’m doing wrong and what to do to use correctly LIKE with mariadb and chinese characters? Thank you in advance
2
Answers
Thank you for the response. The problem is I've changed from
utf8mb4_unicode_ci
toutf8mb4_general_ci
because similar issues occured withutf8mb4_unicode_ci
, as I was told in this thread.Let's add the two following characters in the chartable:
Then I test SELECT with the two collations
Here I notice the right answer comes with
utf8mb4_general_ci
collation, whileutf8mb4_unicode_ci
doesn't return the right answer. The two collations seem each to be appropriate for some unicode characters, and not for others. How to manage this?utf8mb4_general_ci is a simplified faster sort/comparison implementation . It does not follow the Unicode rules and might result in undesirable sorting or comparison.
Instead of you should use utf8mb4_unicode_ci, which is based on the official unicode rules for ,sorting and comparison.
hex('%𣅀%')
is displayed ashex('?')
since the use of unicode characters in identifier names is limited to characters < +U10000 (see also https://mariadb.com/kb/en/identifier-names/).