skip to Main Content

I need to convert latin1 charset of a table to utf8.

Quoting from mysql docs:

The CONVERT TO operation converts column values between the original and named character sets. This is not what you want if you have a column in one character set (like latin1) but the stored values actually use some other, incompatible character set (like utf8mb4). In this case, you have to do the following for each such column:

ALTER TABLE t1 CHANGE c1 c1 BLOB;

ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8mb4;

This answer shows how to find out charset at DB level, table level, and column level. But I need to find out the charset of the actual stored values. How can I do that?

Since my connector/j jdbc connection string doesn’t specify any characterEncoding or connectionCollation properties, it is possible that it used utf8 by default to store the values, in which case I don’t need any conversion, just change the table metadata.

mysql-connector-java version: 8.0.22

mysql database version: 5.6

spring boot version: 2.5.x

2

Answers


  1. The character set of the string in a given column should be the same as the column definition.

    There have been cases where people accidentally store the bytes of the wrong encoding in a column. For example, they store bytes of a latin1 encoding in a utf8 field. This is a terrible idea, because queries can’t tell the difference. Those bytes may not be valid values of the column’s defined encoding, and this results in garbage data. Cleaning up a table where some of the strings are stored in the wrong encoding is an unpleasant chore.

    So I strongly urge you to store only strings encoded in a compatible way according to the column’s definition, and to assume that all strings are stored this way.

    Login or Signup to reply.
  2. To answer the title:

    SHOW CREATE TABLE tablename shows the detault charset for the table and any overrides for individual columns.

    Don’t blindly use CONVERT TO, especially the 2-step ALTER you are showing. Let’s see what is in the table now (SELECT col, HEX(col) ... for something with accented text.

    See Trouble with UTF-8 characters; what I see is not what I stored for the main 4 types of problems.

    This gives several cases and how to fix them. http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
    One case involves using CONVERT TO; two other cases involve using BLOB or VARBINARY.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search