skip to Main Content

I have a list of string/regex that I want to check if its matched from the string input.

Lets just say I have these lists:

$list = [ // an array list of string/regex that i want to check
  "lorem ipsum", // a words
  "example", // another word
  "/(nulla)/", // a regex
];

And the string:

$input_string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer quam ex, vestibulum sed laoreet auctor, iaculis eget velit. Donec mattis, nulla ac suscipit maximus, leo  metus vestibulum eros, nec finibus nisl dui ut est. Nam tristique varius mauris, a faucibus augue.";

And so, I want it to check like this:

if( $matched_string >= 1 ){ // check if there was more than 1 string matched or something...
 // do something...
 // output matched string: "lorem ipsum", "nulla"
}else{
 // nothing matched
}

How can I do something like that?

3

Answers


  1. I’m not sure if this approach would work for your case but, you could treat them all like regexes.

    $list = [ // an array list of string/regex that i want to check
      "lorem ipsum", // a words
      "Donec mattis",
      "example", // another word
      "/(nulla)/", // a regex
      "/lorem/i"
    ];
    $input_string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer quam ex, vestibulum sed laoreet auctor, iaculis eget velit. Donec mattis, nulla ac suscipit maximus, leo  metus vestibulum eros, nec finibus nisl dui ut est. Nam tristique varius mauris, a faucibus augue.";
    
    $is_regex = '/^/.*/[igm]*$/';
    $list_matches = [];
    foreach($list as $str){
        // create a regex from the string if it isn't already
        $patt = (preg_match($is_regex, $str))? $str: "/$str/";
        $item_matches = [];
        preg_match($patt, $input_string, $item_matches);
        if(!empty($item_matches)){
            // only add to the list if matches
            $list_matches[$str] = $item_matches;
        }
    }
    if(empty($list_matches)){
        echo 'No matches from the list found';
    }else{
        var_export($list_matches);
    }
    
    

    The above will output the following:

    array (
      'Donec mattis' => 
      array (
        0 => 'Donec mattis',
      ),
      '/(nulla)/' => 
      array (
        0 => 'nulla',
        1 => 'nulla',
      ),
      '/lorem/i' => 
      array (
        0 => 'Lorem',
      ),
    )
    

    Sandbox

    Login or Signup to reply.
  2. Try the following:

    <?php
    $input_string = "assasins: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer quam ex, vestibulum sed laoreet auctor, iaculis eget velit. Donec mattis, nulla ac suscipit maximus, leo  metus vestibulum eros, nec finibus nisl dui ut est. Nam tristique varius mauris, a faucibus augue.";
    
    $list = [ // an array list of string/regex that i want to check
    "ass", // should match the ass in assasins
    "Lorem ipsum", // a words
    "consectetur", // another word
    "/(nu[a-z]{2}a)/", // a regex
    ];
    $regex_list = [];
    foreach($list as $line) {
        if ($line[0] == '/' and $line[-1] == '/')
            $regex = '(?:' . substr($line, 1, -1) . ')';
        else
            $regex = '\b' . preg_quote($line, $delimiter='/') . '\b';
        $regex_list[] = $regex;
    }
    $regex = '/' . implode('|', $regex_list) . '/';
    echo "$regexn";
    preg_match_all($regex, $input_string, $matches, PREG_SET_ORDER);
    print_r($matches);
    
    $s = [];
    foreach ($matches as &$match) {
        $s[] = $match[0];
    }
    $s = json_encode($s);
    echo "Matched strings: ", substr($s, 1, -1), "n";
    

    Prints:

    /bassb|bLorem ipsumb|bconsecteturb|(?:(nu[a-z]{2}a))/
    Array
    (
        [0] => Array
            (
                [0] => Lorem ipsum
            )
    
        [1] => Array
            (
                [0] => consectetur
            )
    
        [2] => Array
            (
                [0] => nulla
                [1] => nulla
            )
    
    )
    Matched strings: "Lorem ipsum","consectetur","nulla"
    

    Discussion and Limitations

    In processing each element of $list, if the string begins and ends with ‘/’, it is assumed to be a regular expression and the ‘/’ characters are removed from the start and end of the string. Therefore, anything else that does not begin and end with these characters must be a plain string. This implies that if the OP wanted to match a plain string that just happens to begin and end with ‘/’, e.g. ‘/./’, they would have to do it instead as a regular expression: ‘//.//’. A plain string is replaced by the results of calling preg_quote on it to escape special characters that have meaning in regular expressions thus converting it into a regex without the opening and closing ‘/’ delimiters. Finally, all the strings are joined together with the regular expression or character, ‘|’, and then prepended and appended with ‘/’ characters to create a single regular expression from the input.

    The main limitation is that this does not automatically adjust backreference numbers if multiple regular expressions in the input list have capture groups, since the group numberings will be effected when the regular expressions are combined. Therefore such regex patterns must be cognizant of prior regex patterns that have capture groups and adjust its backreferences accordingly (see demo below).

    Regex flags (i.e. pattern modifiers) must be embedded within the regex itself. Since such flags in one regex string of $list will effect the processing of another regex string, if flags are used in one regex that do not apply to a subsequent regex, then the flags must be specifically turned off:

    <?php
    $input_string = "This is an example by Booboo.";
    
    $list = [ // an array list of string/regex that i want to check
    "/(?i)booboo/", // case insensitive
    "/(?-i)EXAMPLE/" // explicitly not case sensitive
    ];
    $regex_list = [];
    foreach($list as $line) {
        if ($line[0] == '/' and $line[-1] == '/')
            $regex_list[] = substr($line, 1, -1);
        else
            $regex_list[] = preg_quote($line, $delimiter='/');
    }
    $regex = '/' . implode('|', $regex_list) . '/';
    echo $regex, "n";
    preg_match_all($regex, $input_string, $matches, PREG_SET_ORDER);
    print_r($matches);
    
    $s = [];
    foreach ($matches as &$match) {
        $s[] = $match[0];
    }
    $s = json_encode($s);
    echo "Matched strings: ", substr($s, 1, -1), "n";
    

    Prints:

    /(?i)booboo|(?-i)EXAMPLE/
    Array
    (
        [0] => Array
            (
                [0] => Booboo
            )
    
    )
    Matched strings: "Booboo"
    

    This shows how to correctly handle backreferences by manually adjusting the group numbers:

    <?php
    $input_string = "This is the 22nd example by Booboo.";
    
    $list = [ // an array list of string/regex that i want to check
    "/([0-9])\1/", // two consecutive identical digits
    "/(?i)([a-z])\2/" // two consecutive identical alphas
    ];
    $regex_list = [];
    foreach($list as $line) {
        if ($line[0] == '/' and $line[-1] == '/')
            $regex_list[] = substr($line, 1, -1);
        else
            $regex_list[] = preg_quote($line, $delimiter='/');
    }
    $regex = '/' . implode('|', $regex_list) . '/';
    echo $regex, "n";
    preg_match_all($regex, $input_string, $matches, PREG_SET_ORDER);
    print_r($matches);
    
    $s = [];
    foreach ($matches as &$match) {
        $s[] = $match[0];
    }
    $s = json_encode($s);
    echo "Matched strings: ", substr($s, 1, -1), "n";
    

    Prints:

    /([0-9])1|(?i)([a-z])2/
    Array
    (
        [0] => Array
            (
                [0] => 22
                [1] => 2
            )
    
        [1] => Array
            (
                [0] => oo
                [1] =>
                [2] => o
            )
    
        [2] => Array
            (
                [0] => oo
                [1] =>
                [2] => o
            )
    
    )
    Matched strings: "22","oo","oo"
    
    Login or Signup to reply.
  3. Typically, I scream bloody murder if someone dares to stink up their code with error suppressors. If your input data is so out-of-your-control that you are allowing a mix of regex an non-regex input strings, then I guess you’ll probably condone @ in your code as well.

    Validate the search string to be regex or not as demonstrated here. If it is not a valid regex, then wrap it in delimiters and call preg_quote() to form a valid regex pattern before passing it to the actual haystack string.

    Code: (Demo)

    $list = [ // an array list of string/regex that i want to check
      "lorem ipsum", // a words
      "example", // another word
      "/(nulla)/", // a valid regex
      "/[,.]/", // a valid regex
      "^dolor^", // a valid regex
      "/path/to/dir/", // not a valid regex
      "[integer]i", // valid regex not implementing a character class
    ];
    
    $input_string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer quam ex, vestibulum sed laoreet auctor, iaculis eget velit. Donec mattis, /path/to/dir/ nulla ac suscipit maximus, leo  metus vestibulum eros, nec finibus nisl dui ut est. Nam tristique varius mauris, a faucibus augue.";
    
    $result = [];
    foreach($list as $v) {
        if (@preg_match($v, '') === false) {
            // not a regex, make into one
            $v = '/' . preg_quote($v, '/') . '/';
        }
        preg_match($v, $input_string, $m);
        $result[$v] = $m[0] ?? null;
    }
    var_export($result);
    

    Or you could write the same thing this way, but I don’t know if there is any drag in performance by checking the pattern against a non-empty string: (Demo)

    $result = [];
    foreach($list as $v) {
        if (@preg_match($v, $input_string, $m) === false) {
            preg_match('/' . preg_quote($v, '/') . '/', $input_string, $m);
        }
        $result[$v] = $m[0] ?? null;
    }
    var_export($result);
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search