skip to Main Content

Please explain this situation: I have a mySQL database which is set to have its connection as latin1 and the character set of tables and columns to latin1. I send it UTF-8 encoded data (e.g. from a web form which encodes as UTF-8 via PHP). Later I retrieve that data and display it on a web page set to use UTF-8 encoding.

Will I see what I put in? I know I will for ASCI but how about for e.g. ΓΌ – a German umluat. And for a more exotic non Latin1 character?

In fact my simple test of sending UTF-8 encoded Japanese characters to the database from WordPress and then viewing them on a webpage suggests that there are no problems. I suspect that there is no conversion and that the database just stores the bytes it gets. But, if this is the case what is the significance of setting a character set? It is not a collation, which (I think) is to do with sorting.

Thank you

2

Answers


  1. The database will attempt to convert your latin1 connection data to its internal encoding (typically utf8mb4, which is determined during installation). This process is likely to corrupt your string.

    Even if the conversion were successful, searching or ordering this column would not be possible. If no alternatives are available, it would be preferable to store your UTF-8 string as a varbinary type.

    Login or Signup to reply.
  2. See Trouble with UTF-8 characters; what I see is not what I stored

    latin1 can handle umlaut-u and other Western European characters. But you must tell MySQL that the client is talking latin1. And tell declare the columns to be utf8mb4. (Or whatever combination you have.)

    Latin1 cannot handle any Asian character set.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search