I have this regex:
$text = preg_replace_callback('/(d+.d+|b[A-Z](?:.[A-Z])*b.?)|([.,;:!?)])s*/', function ($matches) {
return $matches[1] ? $matches[1] : $matches[2] . ' ';
}, $text);
Which targets the end of sentences and avoids abbreviations like N.B.C.
It works fine. The problem is that it doesn’t detect tree dots ...
or the ellipsis symbol …
as the end of the sentence.
How can I adjust the regex to include it as well?
2
Answers
You can modify your regular expression to include three dots (
...
) or the ellipsis symbol (…
) by adding a new group to the pattern that specifically looks for those characters.Here’s the adjusted regex:
This pattern now includes a subgroup
(.{3}|…)
that looks for either three dots or the ellipsis character. It will match any of these symbols and replace them with themselves followed by a space, just like the other punctuation in your original pattern.If you want to add a space after one of the punctuations, but not for the digits or the abbreviations and then only after 1 or 3 dots, you could make use of SKIP FAIL and
K
In the replacement you could then use a space and use preg_replace
The pattern matches:
b
A word boundary to prevent a partial match(?:
Non capture group for the alternativesd+(?:.d+)+b
Match 1+ digits and repeat 1+ times.
and 1+ digits followed by a word boundary|
Or[A-Z](?:.[A-Z])*b.?
Match a single char A-Z and optionally repeat.
and a char A-Z and an optional dot)
Close the non capture group(*SKIP)(*F)
Skip the match|
Or(?:
Non capture group[,;:!?…]+(?=[^s,;:!?…])
Match 1+ times any of the listed characters, and assert that to the right there is a non whitespace char except being one of the listed characters|
Or(?<!.)
Negative lookbehind, assert not a dot directly to the elft(?:.{3}|.)
Match either 3 or 1 dots(?=[^s.])
Positive lookahead, assert a non whitespace char to the right, except for a dot–
)
Close the non capture groupK
Forget what is matched so farRegex demo | PHP demo
For example