skip to Main Content

I’m building a custom search result where I want to return n characters from left and right of the searched keyword. I would also like to preserve whole words at the beginning and the end.

For example this is the text where I searched the keyword and I
need the text around it too.

So if I say n characters is 10 I would preferably get:

..searched the keyword and I need..

A simpler acceptable solution would be to break the words so the result would be:

..rched the keyword and I nee..

I started with this but got stuck on the string part before the keyword:

private function getSubstring($content,$keyword, $nOfChars) {
    $content = strtolower(strip_tags($content));
    $noOffoundStrings = substr_count($content, $keyword);
    $position = strpos($content, $keyword);
    $keywordLength = strlen($keyword);
    $afterKey = substr($content, $position + $keywordLength, $nOfChars);
    $beforeKey = substr($content, $position , -???); // how to get string part before the searched keyword
}

3

Answers


  1. I have concentrated on the building of the result set only.

    The adornment(... before and after) is static and doesn’t treat the edge cases when the keyword occurs at the very beginning or end of the text.

    Keeping whole words isn’t handled either (that adds too much complexity to the answer). If you are satisfied with an answer to this question you may want to ask a new question for that.

    the mb_* variants of the string functions work with non-English text (Latin ABC with diacritics [ő, ű, â, î, ș, ț, etc.], Israeli, Arabic, Hindi, etc.).

    $str = strip_tags('<p>This is a search text <span>with</span> some content blabla blabla search text of length</p>');
    
    $keyword = 'search';
    
    $a = explode(strtolower($keyword), strtolower($str));
    $resultArray = [];
    $keepChars = 10;
    
    for ($i = 0; $i < count($a) - 1; $i++) {
        $beforeKey = $a[$i];
        $afterKey = $a[$i + 1];
        $resultArray[] = '...' 
                       . mb_substr($beforeKey, min(-$keepChars, mb_strlen($beforeKey))) 
                       . $keyword 
                       . mb_substr($afterKey, 0, min($keepChars, mb_strlen($afterKey))) 
                       . '...';
    }
    
    var_dump($resultArray);
    

    This should output the following:

    array(2) {
      [0]=>
      string(32) "...this is a search text with..."
      [1]=>
      string(32) "...la blabla search text of l..."
    }
    
    Login or Signup to reply.
  2. you could use the explode function

        $numChar = 12;
        $string = "apelle figlio di apollo fece una palla";
        $searched = "apollo";
        
        $exploded = explode($searched, $string);
        
        if(count($exploded) == 1) {
            //no match
            return "";
        } 
        
        $exlopedBefore = array_reverse(explode(" ", trim($exploded[0])));
        
        $before = "";
        
        foreach($exlopedBefore as $string) {
            if(strlen($before) >= $numChar) {
                break;
            }
            $before = $string . " " . $before;
        }
      
        
        $explodedAfter = explode(" ", trim($exploded[1]));
        
        $after = "";
        
        foreach($explodedAfter as $string) {
            if(strlen($after) >= $numChar) {
                break;
            }
            $after .= " " . $string;
        }
      
        
        
        $complete = $before . $searched . $after;
        echo $complete;
    
    Login or Signup to reply.
  3. I am comfortable recommending a regex approach because it concisely affords precise handling of needles at the start, middle, and end of the haystack string.

    This will try to show full words on both sides of the needle. Logically if there are no words on either side, no dots will be added.

    Code: (Demo)

    $needle = "keyword";
    $extra = 10;
    
    foreach ($texts as $text) {
        $new = preg_replace_callback(
                   "/.*?(S+.{0,$extra})?($needle)(.{0,$extra}S+)?.*/",
                   function($m) {
                       return sprintf(
                           '%s<b>%s</b>%s',
                           strlen($m[1]) ? "..{$m[1]}" : '',
                           $m[2],
                           strlen($m[3] ?? '') ? "{$m[3]}.." : ''
                       );
                   }, 
                   $text,
                   1,
                   $count
               );
        echo ($count ? $new : '') . "n";
    }
    

    Input:

    $texts = [
        "For example this is the text where I searched the keyword and I need the text around it too.",
        "keyword at the very start",
        "Or it can end with keyword",
        "Nothing to see here officer.",
        "keyword",
    ];
    

    Output:

    ..searched the <b>keyword</b> and I need..
    <b>keyword</b> at the very..
    ..can end with <b>keyword</b>
    
    <b>keyword</b>
    

    Pattern breakdown:

    /               #starting pattern delimiter
    .*?             #lazily match zero or more characters (giving back as much as possible)
    (               #start capture group 1
      S+           #match one or more visible characters
      .{0,$extra}   #match between 0 and 10 characters
    )?              #end capture group 1 and make matching optional
    ($needle)       #match the needle string as capture group 2
    (               #start capture group 3
      .{0,$extra}   #match between 0 and 10 characters
      S+           #match one or more visible characters
    )?              #end capture group 3 and make matching optional
    .*              #greedily match zero or more characters
    /
    
    • Add the u pattern modifier if multibyte characters might be encountered.
    • Add the i pattern modifier for case-insensitive matching.
    • Add the s pattern modifier if your string might contain newline characters.
    • Wrap the needle string in b (word boundary metacharacters) for whole word matching.
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search