How to insert HTML tags and properly wrap words on spefic substrings in a string in PHP

MarcDG
January 7, 2024
224 views
0 votes
2 Answers

I have to deal with plant latin names and need to style parts of the words in the name of the plants comming from the DB. The names are stored as raw text.

Example string : "Androsace angrenica ‘Angelica’ subsp. Violaceae".

And need to style it like so :

<em>Androsace angrenica</em> 'Angelica' subsp. <em>Violaceae</em>

Some specific words are to be tracked not to be in italic, like shown in the example above and in the array $toFind.
I got so far but ending up with avery single words except for the one in the array being wraped by  like so :

<em>Androsace</em> <em>angrenica</em> 'Angelica' subsp. <em>Violaceae</em>

I would like to be able to prevent following   like in the first part of the name and join them in one single tag wrap shown in the first example.

# Array of words not be wraped in italic
$toFind = ["subsp.", "var.", "f.", "(voir)", "hybride"];

# Plant name
$name = "Androsace angrenica 'Angelica' subsp. Violaceae";

# Make an array of words from the name
$words = explode( " ", $name );

$newWords = [];

foreach( $words as $key => $word ) {
    if( in_array( $word, $toFind )) {
        $newWords[] =  $word;
    }else{
        # Catch the word or words surrounded  by single quotes like 'Angelica'
        $isHybrid = preg_match_all( "/'([^.]*?)'/", $word, $matches, PREG_PATTERN_ORDER );

        if( $isHybrid ){
            # No tags required
            $newWords[] = $word ;
        }else{
            # Tags required for these words
            $newWords[] = "<em> ". $word . "</em>";
        }
    }
}

echo implode(" ", $newWords);

Note that this exemple name is one of many possiblities like so:

Allium obliquum
Allium ostrowkianum (voir) A. oreophilum
Allium senescens subsp. glaucum
Allium sikkimense
Androsace × pedemontana

Thanks!

Answers

- Wongjn
- January 6, 2024 at 8:43 pm
- 0 votes
0
You could consider processing the implode() result:
```
echo str_replace(" ", " ", implode(" ", $newWords));
```
This replaces all instances of   to after implosion of the $newWords.
Login or Signup to reply.

Your task logic is a blend of literal and non-literal word exclusions. The truth is that you don’t need to explode() the string into a temporary array, compare each word against a blacklist array, then use a regex to conditionally exclude single-quote-wrapped words, then implode the potentially mutated words again.

It will be much more direct to prepare a single regex pattern with a negated lookahead to exclude disqualified words. Then preg_replace() is the best single-call tool to execute your logic.

Code: (Demo)

$blacklist = ["subsp.", "var.", "f.", "(voir)", "hybride"];
$prepped = array_map('preg_quote', $blacklist);             // escape special characters
$prepped[] = "'S+'";                                       // do not escape special characters
$negLookAhead = '(?!' . implode('|', $prepped) . ')';       // create negated lookahead

$name = "Androsace angrenica 'Angelica' subsp. Violaceae";

// echo "$negLookAheadn";

echo preg_replace('#(?<=^|s)' . $negLookAhead . 'S+(?=s|$)#', '<em>$0</em>', $name);

Output:

<em>Androsace</em> <em>angrenica</em> 'Angelica' subsp. <em>Violaceae</em>

Not only is this more direct and more concise, if you want to extend the literal or non-literal exclusions, you don’t need to modify the pattern, only the $prepped array.

Pattern breakdown:

#            // start of pattern delimiter
(?<=^|s)    // check that previous position was start of string or previous character was a whitespace
(?!          // do not allow match to qualify if any of the follow is satisfied
   subsp.   // literal string match
   |         // OR operator
   var.     // literal string match
   |         // OR operator
   f.       // literal string match
   |         // OR operator
   (voir)  // literal string match
   |         // OR operator
   hybride   // literal string match
   |         // OR operator
   'S+'     // single quote, one or more of any non-whitespace character, single quote
)            // close the negated lookahead logic
S+          // match one or more non-whitespace characters
(?=s|$)     // check that next character is a whitespace or the next position is the end of string
#            // end of pattern delimiter

Here is a demo using your entire sample data set.

Please signup or login to give your own answer.

Click here to cancel reply.