There are names of records in which are mixed Cyrillic and Latin words, symbols, spaces, digits, etc.
I need to preg_match (PHP) only Latin part with any symbols in any combinations.
Test set:
БлаблаБла Uty-223
Блабла (бла.)Бла CAROP-C
Бла бла ST.MORITZ
Бла бла RAMIRO2-TED
LA PLYSGNE 1 H - 001
(Блабла) – doesn’t matter Cyrillic words.
So i tried pattern:
/[-0-9a-zA-Z.]+/
But [Блабла (бла.)Бла CAROP-C] and [LA PLYSGNE 1 H – 001] not found as string.
Next i tried to write more flexible pattern:
/[-0-9a-zA-Z]+(?:.)?+(?:s+)?+[-0-9a-zA-Z]+/
But there is still problem with matching [LA PLYSGNE 1 H – 001].
Is there any idea how can this be solved?
Thanks.
2
Answers
If the
.
and-
can not occur at the beginning or end, you can start the match with[0-9a-zA-Z]
and optionally repeat one of the chars listed in the character class followed by again[0-9a-zA-Z]
b
is a word boundary preventing a partial word matchh
matches a horizontal whitespace characterSee a regex101 demo.
Matching at least a single char
[0-9a-zA-Z]
with allowed chars.
and-
in the whole string, and asserting whitespace boundaries to the left and right(?<!S)
and(?!S)
are lookaround assertions that are whitespace boundaries, asserting not a non whitespace char to the left and the right.See a regex101 demo.
You can also use a script run starting with a latin letter:
demo