I am having issues with allowing all English/Latin based characters (including accents), but disallowing Chinese/Russian characters.
The first version I had was as follows:
strlen($values['person_name']) != mb_strlen($values['person_name'], 'utf-8')
This one worked fine initially, but when Icelandic/Czech names came into play, this did not work anymore.
The second version I had was as follows:
preg_match("~^[a-zÀ-ÿ]['a-zÀ-ÿ -]*$~i", $values['person_name'])
This seemed to work fine for majority of cases, but it is giving an error on a test name
Eliška Koňaříková
I have tried the following as well without any luck:
preg_match("/[^w ]/u", $values['person_name']) //does not allow š
preg_match("/PL/u", $values['person_name']) //does not allow š
preg_match("/^[a-zA-Zs,.'-pL]+$/u", $values['person_name']) //allows š, but also allows 書
preg_match("/^[s,.'-]*p{L}[p{L}s,.'-]*$/u", $values['person_name']) //allows š, but also allows 書
preg_match("/[^a-zA-Z0-9àâáçéèèêëìîíïôòóùûüÂÊÎÔúÛÄËÏÖÜÀÆæÇÉÈŒœÙñý,. ]/u", $values['person_name']) //allows š, but also allows 書
preg_match("~^[a-zÀ-ÿ]['a-zÀ-ÿ -]*$~iu", $values['person_name']) //does not allow š
preg_match("/^[p{L}-]*$/u", $values['person_name']) //allows š, but also allows 書
preg_match("/([w ]{2,})/u", $values['person_name']) //allows š, but also allows 書
preg_match('/[^p{Latin}0-9€, !"§$%&/()=#|<>]/u', $values['person_name']) //allows š, but also allows 書
All of the above either failed with the name provided, or it allowed Chinese characters.
I believe the best route for me would be to revert back to the check that was working for most characters (except with the Czech names that are giving an error):
preg_match("~^[a-zÀ-ÿ]['a-zÀ-ÿ -]*$~i", $values['person_name'])
And manually add the Czech characters that are not accepted such as š, ň, ř, etc.
Is there a cleaner solution than manually having to specify each of these characters?
2
Answers
maybe it’s better to replace the chars, this is only an example of doing that and it’s not a complete function:
preg_match()
allows to use unicode scripts:p{Latin}
p{Latin}+
^p{Latin}+$
(^p{Latin}+$)
(^p{Latin}+$)D
(^p{Latin}+$)D
Output:
For transliteration check the
Transliterator
class. It is parts of PHPs standard unicode extension –ext/intl
. It allows for extensive transformations of unicode strings.Output:
The first (untransformed) word in the example is Amharisch. Even ICU has limits depending on the version.
More about the ICU Script Transliterations: https://unicode-org.github.io/icu/userguide/transforms/general/#scriptlanguage