skip to Main Content

I’m trying to make a regex statement that can get the previous sentence before the occurrence of "[bbcode]" but is flexible enough to work in different scenarios.

For example, the previous sentence may be defined as following a period. However, it may simply be on a new line. I cannot use ^$ to define start or end of line as this may not always be the case.

Whole test string:

Example 1:
Blah blah blah. THIS SENTENCE SHOULD BE SELECTED [bbcode]

Example 2:
THIS SENTENCE SHOULD BE SELECTED [bbcode]

Example 3:
A trick sentence. And another. THIS SENTENCE SHOULD BE SELECTED


[bbcode]

Expected matches:
All three instances of THIS SENTENCE SHOULD BE SELECTED should be matched.

This is the regex I tried:

'/(?:.)(.+)([bbcode])/gUs'

This fails when sentence is on a new line as in Example 2.

Link to
Regex Interrupter using my Regex

I have tried negative lookbehinds to no avail. The strings "THIS SENTENCE SHOULD BE SELECTED" should get picked up in all three examples.

Picking up surrounding spaces is ok because I can trim it later.

Challenges:

  • The entire supplied code must be tested as one string. This is how the data will be supplied and will likely contain many random spaces, new lines etc which the regex must consider.

  • It is likely impossible to prepare / sanitize the string first, as the string will likely be very poorly formatted without proper punctuation. Contracting the string could cause unintended run-on sentences.

2

Answers


  1. This can be achieved with basic PHP functions. Something like this:

    function extractSentence($string)
    {
        $before = substr($string, 0, strpos($string, '[bbcode]'));
        return trim(substr($before, strrpos($before, '.')), "n .");
    }
    

    The advantage is that it is easy to understand, doesn’t take much time to develop and can more easily be changed if that need arises.

    See: PHP Fiddle

    Login or Signup to reply.
    1. Match and release an optional space ( *K) then
    2. Lazily match one or more non-dot characters ([^.]+?) then
    3. Lookahead for zero or more whitespace characters followed by the bbcode tag ((?=s+[bbcode]))
    4. Make the pattern case-insensitive if the bbcode might be uppercase (i)

    Code: (Demo)

    $tests = [
        'Blah blah blah. THIS SENTENCE SHOULD BE SELECTED [bbcode] text',
        'THIS SENTENCE SHOULD BE SELECTED [bbcode] text',
        'A trick sentence. And another. THIS SENTENCE SHOULD BE SELECTED
    
    
    [bbcode]] text'
    ];
    
    foreach ($tests as $test) {
        var_export(preg_match('/ *K[^.]+?(?=s+[bbcode])/i', $test, $m) ? $m[0] : 'no match');
        echo "n---n";
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search