skip to Main Content

I have to deal with plant latin names and need to style parts of the words in the name of the plants comming from the DB. The names are stored as raw text.

Example string : "Androsace angrenica ‘Angelica’ subsp. Violaceae".

And need to style it like so :

<em>Androsace angrenica</em> 'Angelica' subsp. <em>Violaceae</em>

Some specific words are to be tracked not to be in italic, like shown in the example above and in the array $toFind.
I got so far but ending up with avery single words except for the one in the array being wraped by <em></em> like so :

<em>Androsace</em> <em>angrenica</em> 'Angelica' subsp. <em>Violaceae</em>

I would like to be able to prevent following </em> <em> like in the first part of the name and join them in one single tag wrap shown in the first example.

# Array of words not be wraped in italic
$toFind = ["subsp.", "var.", "f.", "(voir)", "hybride"];

# Plant name
$name = "Androsace angrenica 'Angelica' subsp. Violaceae";

# Make an array of words from the name
$words = explode( " ", $name );

$newWords = [];

foreach( $words as $key => $word ) {
    if( in_array( $word, $toFind )) {
        $newWords[] =  $word;
    }else{
        # Catch the word or words surrounded  by single quotes like 'Angelica'
        $isHybrid = preg_match_all( "/'([^.]*?)'/", $word, $matches, PREG_PATTERN_ORDER );

        if( $isHybrid ){
            # No tags required
            $newWords[] = $word ;
        }else{
            # Tags required for these words
            $newWords[] = "<em> ". $word . "</em>";
        }
    }
}

echo implode(" ", $newWords);

Note that this exemple name is one of many possiblities like so:

Allium obliquum
Allium ostrowkianum (voir) A. oreophilum
Allium senescens subsp. glaucum
Allium sikkimense
Androsace × pedemontana

Thanks!

2

Answers


  1. You could consider processing the implode() result:

    echo str_replace("</em> <em>", " ", implode(" ", $newWords));
    

    This replaces all instances of </em> <em> to after implosion of the $newWords.

    Login or Signup to reply.
  2. Your task logic is a blend of literal and non-literal word exclusions. The truth is that you don’t need to explode() the string into a temporary array, compare each word against a blacklist array, then use a regex to conditionally exclude single-quote-wrapped words, then implode the potentially mutated words again.

    It will be much more direct to prepare a single regex pattern with a negated lookahead to exclude disqualified words. Then preg_replace() is the best single-call tool to execute your logic.

    Code: (Demo)

    $blacklist = ["subsp.", "var.", "f.", "(voir)", "hybride"];
    $prepped = array_map('preg_quote', $blacklist);             // escape special characters
    $prepped[] = "'S+'";                                       // do not escape special characters
    $negLookAhead = '(?!' . implode('|', $prepped) . ')';       // create negated lookahead
    
    $name = "Androsace angrenica 'Angelica' subsp. Violaceae";
    
    // echo "$negLookAheadn";
    
    echo preg_replace('#(?<=^|s)' . $negLookAhead . 'S+(?=s|$)#', '<em>$0</em>', $name);
    

    Output:

    <em>Androsace</em> <em>angrenica</em> 'Angelica' subsp. <em>Violaceae</em>
    

    Not only is this more direct and more concise, if you want to extend the literal or non-literal exclusions, you don’t need to modify the pattern, only the $prepped array.

    Pattern breakdown:

    #            // start of pattern delimiter
    (?<=^|s)    // check that previous position was start of string or previous character was a whitespace
    (?!          // do not allow match to qualify if any of the follow is satisfied
       subsp.   // literal string match
       |         // OR operator
       var.     // literal string match
       |         // OR operator
       f.       // literal string match
       |         // OR operator
       (voir)  // literal string match
       |         // OR operator
       hybride   // literal string match
       |         // OR operator
       'S+'     // single quote, one or more of any non-whitespace character, single quote
    )            // close the negated lookahead logic
    S+          // match one or more non-whitespace characters
    (?=s|$)     // check that next character is a whitespace or the next position is the end of string
    #            // end of pattern delimiter
    

    Here is a demo using your entire sample data set.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search