what is php regex for check if upload file name have german umlauts?
file name : Screenshot_Erdös.png
i tried below but not working
if ( preg_match('(?<![äöüÄÖÜßw])([äöüÄÖÜßw]+)(?![äöüÄÖÜßw])', $file_name )){
$file['error'] = __( "WARNING: Invalid file name. German umlauts are not allowed.", 'wp-file' );
}
2
Answers
You can try this
There are two ways to produce an "a with umlaut (or diaeresis)" with UNICODE:
ä
LATIN SMALL LETTER A WITH DIAERESISa
LATIN SMALL LETTER A), followed by U+0308̈
COMBINING DIAERESISAll other vowels "e i o u" and the "y" too are in the same situation: there are these two ways to produce them.
To deal with this state of affairs, you can simply consider the two possibilities in your pattern, but you can also use the Normalizer from intl to convert the string to NFC before.
Other thing to take in account, when you have to deal with multibyte characters (that is the case in UTF-8 for accented characters), you need to inform the regex engine, otherwise this one will read the subject string and the pattern byte by byte instead of codepoint by codepoint.
Consider this character class:
[ä]
(with the "readymade" small A with diaeresis). ä is encoded with two bytes in UTF-8: C3 A4.That means that by default a pattern with this character class will succeed if one of these two byte is found in the subject string. But that doesn’t mean that the subject string contains ä:
This pattern succeeds because U+21A4
↤
LEFTWARDS ARROW FROM BAR is encoded with the bytes E2 86 A4 and the byte A4 is found.To inform the regex engine that the strings (the pattern and the subject) have to be read codepoint by codepoint, you can start the pattern like that:
or use the u modifier:
To conclude, a pattern to match a diaeresis can be written like that:
or
where
N{U+0308}
stands for the combining diaeresis andN{U+00A8}
for the diaeresis alone.äëïöüÿ
are "readymade" characters from the UNICODE block U+0080 -> U+00FF Latin-1 supplement. Uppercase letters are taken in account with the i modifier.or like that:
or with a NFC normalized string: