skip to Main Content

Working on an application and it’s a bit of a mess with its encoding protocol.

The application currently uses php_value default_charset ISO-8859-1 but also in places does <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

The MySql charset is Latin-1 (ISO-8859-1) so explains why the default_charset is being done.

There’s also a wide range of encoding being done everywhere utf8_encode, json_encode, mb_convert_encoding (less prevalent)

The biggest issue we are seeing is with our mobile app REST API. People submitting emojis and such can cause some really strange behavior. Fields being emptied on display, etc.

Is there a standard protocol for handling this type of encoding to get a more uniform approach?

2

Answers


  1. I have been in that hell too (self created by the way).

    What the lazy textbook answer is: Redo the whole softwarestack in UTF-8. But that isn’t always feasable (finance are tight, time is an issue, etc)

    My practical advice would be:

    Start with your database and MAKE it understand unicode (UTF-8). The database is at the core of your application, and schould be able to store unicode. This sounds maybe scary, but if it already uses LATIN-1 you can easily convert the relevant columns to unicode.

    The PHP code schould not notice the difference.

    From there on, make sure your PHP is using UTF-8

    php_value default_charset ISO-8859-1 <-- change that
    

    And last, you’ll have the messy job to gradually look through all the code, and remove all convertions to LATIN-1.

    I hope you can set up some kind of test environment, because hacking like this in a production environment is bad for your mental wellbeing. 🙂

    Good luck.

    Login or Signup to reply.
  2. Erwin’s answer is not complete.

    MySQL’s charset for UTF-8 is called utf8mb4, not simply utf8. You must use utf8mb4 to allow for Emoji.

    "ISO-8859-1" is the same as MySQL’s "latin1". But if you claim that it is UTF-8, a mess ensues. See Trouble with UTF-8 characters; what I see is not what I stored , especially the parts on "truncate", which seems to be your main symptom.

    ALTER TABLE can be used in either of 2 ways to change the charset of individual or all columns. However, if the data has been truncated, it is lost and cannot be recovered without reloading. So, I suggest you change the table definitions and connection parameters and start over.

    Do not use any mb conversion routines, that will make it even harder to debug.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search