I want to analyze texts for words and/or terms that don’t include predefined syllables/letters but I don´t get the result I expect.
I tried many things, my closest try was this for example.
In the sentence (some things are no real words just to check if letter combinations would be considered in the expected way) "He was eating the cake se in hell." I want to get all words but not those which have a "he" in the word whether it is the beginning, inside, or the ending.
Example:
$pattern = '#b(w*[^(*he*|s)])b#i';
$text = 'He was eating the cake se in hell.';
if(preg_match_all($pattern, $text,$match)){
var_dump($match);
} else{
echo "Match not found.";
}
I would expect to get
[was, eating, cake, se, in]
but I got
[was, eating, in, hell].
Why not "cake"?
Why "hell"
?
In fact, my use case is in German but because most users here are not German-speaking I try to use the example above. Also a problem is that w
wouldn’t consider üÜöÖäÄß
letters which I also need.
3
Answers
Would be more flexible to use string functions..
and
To Achieve these both conditions; you can change the pattern match to :
You can extract matches of the regular expression
Demo
You can see at the link that the following words were matched.
The flags set for the regular expression are
i
: case insensitive matchu
: match with full UnicodeThe regular expression has the following elements.
Here I’ve used the tempered greedy token solution.