skip to Main Content

I have a question about the correct use of
$mysqli->set_charset(). I haven’t used this feature on my site for years. Now I’m rewriting my connection script, and want to apply $mysqli->set_charset() properly. At the moment the site is still based on ‘latin1’ (but I will soon switch to UTF-8 (utf8mb4).)

MySQLi on my server (which I manage myself) has been configured with latin1 for years. I assume it wouldn’t hurt to add this $mysqli->set_charset("latin1") now?

And is it true that if MySQLi were configurated with utf8mb4 by default, without that $mysqli->set_charset() function it would be a party on my site with weird encoding characters?

I’d like to make sure about my assumption.

2

Answers


  1. It’s quite simple. Set mysqli_set_charset() to the value you expect your data to be encoded in. So if the data in your table’s columns is stored in utf8mb4 then use that charset for the connection.

    You cannot set the default charset in mysqli. The mysqli extension will actually use the default value that MySQL provides for its clients. This is why it’s recommended to always set the charset using mysqli_set_charset(). Unless you are dealing with some legacy database, always set your charset to utf8mb4 which covers the widest range of characters.

    Login or Signup to reply.
  2. mysqli::set_charset() set’s the connection’s charset, which is "all the strings that I send through this connection will be using this encoding, and I expect that encoding back as well". You need to match this to the encoding that you are using on the PHP side.

    That said, even if the current setting is wrong you may wind up with broken data if you change it from its current value. This is because in some situations the data that gets mangled in transit from your to your DB will get un-mangled in the same way so long as the settings are consistent.

    Before you make any changes you need to determine what encodings are currently in use, and if the data in your DB is mangled. From there you can make a path to ensuring that all the encodings match, that your data is correctly encoded and handle at all steps, as well as fixing your existing data.

    As always, refer to the masterpost: UTF-8 all the way through

    Extra thoughts:

    • String encoding is generally not detectable, it is metadata that must be tracked separately.
    • latin1 is actually ISO-8859-1, but beware it’s evil twin cp1252 which stuffs in extra symbols in the reserver 8X and 9X byte ranges, notably €. ref
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search