I’m building a custom search result where I want to return n characters from left and right of the searched keyword. I would also like to preserve whole words at the beginning and the end.
For example this is the text where I searched the keyword and I
need the text around it too.
So if I say n characters is 10 I would preferably get:
..searched the keyword and I need..
A simpler acceptable solution would be to break the words so the result would be:
..rched the keyword and I nee..
I started with this but got stuck on the string part before the keyword:
private function getSubstring($content,$keyword, $nOfChars) {
$content = strtolower(strip_tags($content));
$noOffoundStrings = substr_count($content, $keyword);
$position = strpos($content, $keyword);
$keywordLength = strlen($keyword);
$afterKey = substr($content, $position + $keywordLength, $nOfChars);
$beforeKey = substr($content, $position , -???); // how to get string part before the searched keyword
}
3
Answers
I have concentrated on the building of the result set only.
The adornment(
...
before and after) is static and doesn’t treat the edge cases when the keyword occurs at the very beginning or end of the text.Keeping whole words isn’t handled either (that adds too much complexity to the answer). If you are satisfied with an answer to this question you may want to ask a new question for that.
the
mb_*
variants of the string functions work with non-English text (Latin ABC with diacritics [ő, ű, â, î, ș, ț, etc.], Israeli, Arabic, Hindi, etc.).This should output the following:
you could use the explode function
I am comfortable recommending a regex approach because it concisely affords precise handling of needles at the start, middle, and end of the haystack string.
This will try to show full words on both sides of the needle. Logically if there are no words on either side, no dots will be added.
Code: (Demo)
Input:
Output:
Pattern breakdown:
u
pattern modifier if multibyte characters might be encountered.i
pattern modifier for case-insensitive matching.s
pattern modifier if your string might contain newline characters.b
(word boundary metacharacters) for whole word matching.