Please explain this situation: I have a mySQL database which is set to have its connection as latin1 and the character set of tables and columns to latin1. I send it UTF-8 encoded data (e.g. from a web form which encodes as UTF-8 via PHP). Later I retrieve that data and display it on a web page set to use UTF-8 encoding.
Will I see what I put in? I know I will for ASCI but how about for e.g. ΓΌ – a German umluat. And for a more exotic non Latin1 character?
In fact my simple test of sending UTF-8 encoded Japanese characters to the database from WordPress and then viewing them on a webpage suggests that there are no problems. I suspect that there is no conversion and that the database just stores the bytes it gets. But, if this is the case what is the significance of setting a character set? It is not a collation, which (I think) is to do with sorting.
Thank you
2
Answers
The database will attempt to convert your
latin1
connection data to its internal encoding (typicallyutf8mb4
, which is determined during installation). This process is likely to corrupt your string.Even if the conversion were successful, searching or ordering this column would not be possible. If no alternatives are available, it would be preferable to store your UTF-8 string as a
varbinary
type.See Trouble with UTF-8 characters; what I see is not what I stored
latin1 can handle umlaut-u and other Western European characters. But you must tell MySQL that the client is talking latin1. And tell declare the columns to be utf8mb4. (Or whatever combination you have.)
Latin1 cannot handle any Asian character set.