I’m translating user-submitted strings from UTF-8 to ASCII-Printable:
$str = 'Thê qúïck 😈 brõwn fõx júmps?😈 Óvér thé lázy dõg?😈';
$out = iconv('UTF-8', 'ASCII//TRANSLIT', $str);
var_dump($out);
$out = 'The quick ? brown fox jumps?? Over the lazy dog??';
I want the extra ?
question marks from $out
removed.
if ($out !== $str && strpos($out, '?') !== false) {
// The input string was modified and contains at least one question mark
//
// Not even really sure where to begin
//
// Do we need to compare the position of every character from the
// original string to every position of the new string and replace
// where the original string did not contain a question mark?
//
// That's all I can think of, but there has to be a better way.
}
I want to keep all //TRANSLIT
characters, including those few included in the example above, e.g.áéïõú
= aeiou
. There is no other nuace to this question. I think it boils down to a string comparison and replace question.
I’m not necessarily looking for someone to write the entire code, just a pointer in the right direction of how you’d tackle this.
2
Answers
This works for me, although I'm sure there are better solutions that people can come up with.
Output
Function
For PHP < 7.4.0
Here is a solution based on
transliterator_transliterate()
:Output:
Note that the emoji are kept by
transliterator_transliterate()
, so I used a regex to remove all the remaining non-ASCII characters.