skip to Main Content

For User Agents and Image EXIF data, my system tries to convert any UTF-8 characters, using iconv().

However, sometimes I get the following error:

PHP Warning [8]: iconv(): Detected an illegal character in input string

For examples like these:

iconv('UTF-8', 'ASCII//TRANSLIT', 'Mozilla/5.0 (iPhone; CPU OS 10_15_5 (Ergänzendes Update) like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/12.1.1 Mobile/14E304 Safari/605.1.15');

iconv('UTF-8', 'ASCII//TRANSLIT', 'Ïðîãðàììà öèôðîâîé îáðàáîòêè èçîáðàæåíèé êîìïàíèè ACD Systems');

And the result becomes an empty string.

However, when I copy the above and run manually (on the same server), it worksI get no error, and the characters are converted to "?".

For years that I’ve been trying many different things, such as different encodings, use "IGNORE" instead of "TRANSLIT", use mb_convert_encoding, etc…
But it’s really hard to debug/fix this, if I can’t capture the real input that causes the issue, and I don’t know what I can do to ‘fix’ this.

What can I do, so that whatever input is provided to iconv(), any non-ASCII characters will be converted to a question mark, without failing?

2

Answers


  1. Illegal UTF characters can easily arise through mistakes. An example:

    $currencies='€$';
    $str = "äöü|".substr($currencies,1,1)."|def";
    $ascii = iconv('UTF-8', 'ASCII//TRANSLIT', $str);
    //ascii = false + Notice: iconv(): Detected an illegal character in input string
    

    It is clear for UTF-8 that mb_substr() must be used, not
    substr().

    With iconv, an IGNORE can be added to TRANSLIT to ignore illegal characters.

    $ascii = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $str);
    //$ascii: string(11) ""a"o"u||def"
    

    Finding such illegal characters in strings is not easy. Usual debug outputs falsify these characters or ignore them.
    With such problems I use this special class that can also reproducibly display strings with illegal UTF-8.

    debug::writeUni($str);
    //Output:u{e4}u{f6}u{fc}|x82|def
    

    This output can be taken over with copy and paste.

    $str2 = "u{e4}u{f6}u{fc}|x82|def";
    var_dump($str === $str2); //bool(true)
    
    Login or Signup to reply.
  2. Good morning,
    My problem persists because there are some characters that are not recognized by iconv. I tried several code options from various groups but what actually worked is the following:

    //Nota: Conversor de caracteres para UTF8

     public function ConvertToUTF8($text)
    {
        $encoding = mb_detect_encoding($text.'x', mb_detect_order(), false);
        if($encoding == "UTF-8")
        {
            //Converte letra a letra
            $i    = 0;
            $conv = '';
            do 
            {
                $letra = substr($text,$i,1);
                $conv .= iconv(mb_detect_encoding($letra, mb_detect_order(), true), "UTF-8//IGNORE", $letra);
                $i ++;
            } while ($i < strlen($text) );
            $text = $conv;
        }
        else if ($encoding == 'ISO-8859-1')
        {
            $text = mb_convert_encoding($text, 'ISO-8859-1', 'UTF-8');
        }
        else if ($encoding == 'ASCII')
        {
            $text = mb_convert_encoding($text, "UTF-8");
        }
        $out = iconv(mb_detect_encoding($text.'x', mb_detect_order(), false), "UTF-8//TRANSLIT//IGNORE", $text);
    
        return $out;
    }//Fim Módulo
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search