I have a php function that splits product names from their color name in woocommerce.
The full string is generally of this form "product name – product color", like for example:
"Boxer Welbar – ligth grey" splits into "Boxer Welbar" and "light grey"
"Longjohn Gari – marine stripe" splits into "Longjohn Gari" and "marine stripe"
But in some cases it can be "Tee-shirt – product color"…and in this case the split doesn’t work as I want, because the "-" in Tee-shirt is detected.
How to circumvent this problem? Should I use a "lookahead" statement in the regexp?
function product_name_split($prod_name) {
$currenttitle = strip_tags($prod_name);
$splitted = preg_split("/–|[p{Pd}xAD]|(–)/", $currenttitle);
return $splitted;
}
5
Answers
I’d go for a negative lookahead.
Something like this:
that means to search for a – not followed by any other –
This works if in the color name there will never be a –
As usual, there are several options for this, this is one of them
explode — Split a string by a string
end — Set the internal pointer of an array to its last element
What about counting space characters that surround a dash?
For example:
This automatically trims spaces from split parts as well.
If you have
-
as delimiter (note the spaces around the dash), you may simply useexplode(...)
. If not, useor
with
preg_split()
, see the demos on regex101.com (#2)In
PHP
this could be:Both approaches will yield
To collect the splitted items, use
array_map(...)
:Your sample inputs convey that the neighboring whitespace around the delimiting hyphen/dash is just as critical as the hyphen/dash itself.
I recommend doing all of the html and special entity decoding before executing your regex — that’s what these other functions are built for and it will make your regex pattern much simpler to read and maintain.
p{Pd}
will match any hyphen/dash. Reinforce the business logic in the code by declaring a maximum of 2 elements to be generated by the split.As a general rule, I discourage declaring single-use variables.
Code: (Demo)
Output: