Search with PHP preg_match_all for words which don´t have a defined syllable

FSand
March 14, 2023
280 views
3 votes
3 Answers

I want to analyze texts for words and/or terms that don’t include predefined syllables/letters but I don´t get the result I expect.

I tried many things, my closest try was this for example.
In the sentence (some things are no real words just to check if letter combinations would be considered in the expected way) "He was eating the cake se in hell." I want to get all words but not those which have a "he" in the word whether it is the beginning, inside, or the ending.

Example:

$pattern = '#b(w*[^(*he*|s)])b#i';
$text = 'He was eating the cake se in hell.';
if(preg_match_all($pattern, $text,$match)){
    var_dump($match);
} else{
    echo "Match not found.";
}

I would expect to get

[was, eating,  cake, se, in]

but I got

[was, eating, in, hell].

Why not "cake"? Why "hell"?

In fact, my use case is in German but because most users here are not German-speaking I try to use the example above. Also a problem is that w wouldn’t consider üÜöÖäÄß letters which I also need.

Tags: php regex

Answers

Would be more flexible to use string functions..

<?php

$string = 'He was eating the cake se in hell.';
$filter = 'he';
function wordFilter($string, $filter)
{
        $filtered = [];
        $words = str_word_count($string, 1, $filter);
        foreach($words as $word){
                $lc_word = strtolower($word);
                if (str_contains($lc_word, $filter)) {
                        continue;
                } else {
                        $filtered[] = $word;
                }
        }
        return $filtered;
}

$result = wordFilter($string, $filter);
print_r($result);
?>

- Tushar
- March 14, 2023 at 7:55 am
- 0 votes
0
I want to get all words but not those which have a "he" in the word
whether it is the beginning, inside, or the ending.

and

Also a problem is that w wouldn’t consider üÜöÖäÄß letters which I
also need.

To Achieve these both conditions; you can change the pattern match to :
```
$pattern = '/b(?!.*bheb)[p{L}üÜöÖäÄß]+b/u';
```
Login or Signup to reply.

- CarySwoveland
- March 14, 2023 at 9:04 am
- 0 votes
0
You can extract matches of the regular expression
```
/b(?:(?!he)p{L})+b/iu
```
Demo

You can see at the link that the following words were matched.
```
He was eating the cake se in hell üÜöÖäÄß.
   ^^^ ^^^^^^     ^^^^ ^^ ^^      ^^^^^^^
```
The flags set for the regular expression are
- i: case insensitive match
- u: match with full Unicode
The regular expression has the following elements.
```
b        match a word boundary
(?:       begin a non-capture group
  (?!he)  negative lookahead asserts next two characters are 'he'
  p{L}   match a letter
)+        end non-capture group and execute one or more times 
b        match a word boundary
```
Here I’ve used the tempered greedy token solution.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.