skip to Main Content

I’m translating user-submitted strings from UTF-8 to ASCII-Printable:

$str = 'Thê qúïck 😈 brõwn fõx júmps?😈 Óvér thé lázy dõg?😈';

$out = iconv('UTF-8', 'ASCII//TRANSLIT', $str);


$out = 'The quick ? brown fox jumps?? Over the lazy dog??';

I want the extra ? question marks from $out removed.

if ($out !== $str && strpos($out, '?') !== false) {
    // The input string was modified and contains at least one question mark
    // Not even really sure where to begin
    // Do we need to compare the position of every character from the
    // original string to every position of the new string and replace
    // where the original string did not contain a question mark?
    // That's all I can think of, but there has to be a better way.

I want to keep all //TRANSLIT characters, including those few included in the example above, e.g.áéïõú = aeiou. There is no other nuace to this question. I think it boils down to a string comparison and replace question.

I’m not necessarily looking for someone to write the entire code, just a pointer in the right direction of how you’d tackle this.



  1. Chosen as BEST ANSWER

    This works for me, although I'm sure there are better solutions that people can come up with.

    $str = 'Thê qúïck 😈 brõwn fõx júmps?😈 Óvér thé lázy dõg?😈';
    $out = 'The quick ? brown fox jumps?? Over the lazy dog??';


    var_dump(remove_iconv_question_marks($str, $out));
    // string(46) "The quick   brown fox jumps?  Over the lazy dog? "


     * strip_iconv_question_marks - Remove question marks left behind by iconv()
     * after translating UTF-8 strings to ASCII strings
     * @param string $str_utf8
     * @param string $str_ascii
     * @return string
    function strip_iconv_question_marks($str_utf8, $str_ascii) {
        $arr_utf8 = mb_str_split($str_utf8);
        $arr_ascii = mb_str_split($str_ascii);
        $count = count($arr_utf8);
        for ($i = 0; $i < $count; $i++) {
            if ($arr_ascii[$i] === '?') {
                if ($arr_utf8[$i] !== '?') {
                    $arr_ascii[$i] = ' '; // Prefer blank space over removal
        return implode($arr_ascii);

    For PHP < 7.4.0

    function mb_str_split($str, $len = 1) {
        $arr = [];
        $cnt = mb_strlen($str, 'UTF-8');
        for ($i = 0; $i < $cnt; $i++) {
            $arr[] = mb_substr($str, $i, $len, 'UTF-8');
        return $arr;

  2. Here is a solution based on transliterator_transliterate():

    $str = transliterator_transliterate('Latin-ASCII', 'Thê qúïck 😈 brõwn fõx júmps?😈 Óvér thé lázy dõg?😈');
    $str = preg_replace('/[x80-xFF]/', '', $str);
    echo $str;


    The quick  brown fox jumps? Over the lazy dog?

    Note that the emoji are kept by transliterator_transliterate(), so I used a regex to remove all the remaining non-ASCII characters.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top