skip to Main Content

I have a php function that splits product names from their color name in woocommerce.
The full string is generally of this form "product name – product color", like for example:

"Boxer Welbar – ligth grey" splits into "Boxer Welbar" and "light grey"

"Longjohn Gari – marine stripe" splits into "Longjohn Gari" and "marine stripe"

But in some cases it can be "Tee-shirt – product color"…and in this case the split doesn’t work as I want, because the "-" in Tee-shirt is detected.

How to circumvent this problem? Should I use a "lookahead" statement in the regexp?

function product_name_split($prod_name) {
    $currenttitle = strip_tags($prod_name);

    $splitted = preg_split("/–|[p{Pd}xAD]|(–)/", $currenttitle);

    return $splitted;
} 

5

Answers


  1. I’d go for a negative lookahead.

    Something like this:

    -(?!.*-)
    

    that means to search for a – not followed by any other –

    This works if in the color name there will never be a –

    Login or Signup to reply.
  2. As usual, there are several options for this, this is one of them

    • explode — Split a string by a string

    • end — Set the internal pointer of an array to its last element

    $currenttitle = 'Tee-shirt - product color';
    
    $array = explode( '-', $currenttitle );
    
    echo end( $array );
    
    Login or Signup to reply.
  3. What about counting space characters that surround a dash?

    For example:

    
    function product_name_split($prod_name) {
      $currenttitle = strip_tags($prod_name);
    
      $splitted = preg_split("/s(–|[p{Pd}xAD]|(–))s/", $currenttitle);
    
      return $splitted;
    }
    

    This automatically trims spaces from split parts as well.

    Login or Signup to reply.
  4. If you have - as delimiter (note the spaces around the dash), you may simply use explode(...). If not, use

    s*-(?=[^-]+$)s*
    

    or

    w+-w+(*SKIP)(*FAIL)|-
    

    with preg_split(), see the demos on regex101.com (#2)


    In PHP this could be:

    <?php
    $strings = ["Tee-shirt - product color", "Boxer Welbar - ligth grey", "Longjohn Gari - marine stripe"];
    
    foreach ($strings as $string) {
        print_r(explode(" - ", $string));
    }
    
    foreach ($strings as $string) {
        print_r(preg_split("~s*-(?=[^-]+$)s*~", $string));
    }
    ?>
    

    Both approaches will yield

    Array
    (
        [0] => Tee-shirt
        [1] => product color
    )
    Array
    (
        [0] => Boxer Welbar
        [1] => ligth grey
    )
    Array
    (
        [0] => Longjohn Gari
        [1] => marine stripe
    )
    

    To collect the splitted items, use array_map(...):

    $splitted = array_map( function($item) {return preg_split("~s*-(?=[^-]+$)s*~", $item); }, $strings);
    
    Login or Signup to reply.
  5. Your sample inputs convey that the neighboring whitespace around the delimiting hyphen/dash is just as critical as the hyphen/dash itself.

    I recommend doing all of the html and special entity decoding before executing your regex — that’s what these other functions are built for and it will make your regex pattern much simpler to read and maintain.

    p{Pd} will match any hyphen/dash. Reinforce the business logic in the code by declaring a maximum of 2 elements to be generated by the split.

    As a general rule, I discourage declaring single-use variables.

    Code: (Demo)

    function product_name_split($prod_name) {
        return preg_split(
            "/ p{Pd} /u",
            strip_tags(
                html_entity_decode(
                    $prod_name
                )
            ),
            2
        );
    }
    
    $tests = [
        'Tee-shirt - product color',
        'Boxer Welbar - ligth grey',
        'Longjohn Gari - marine stripe',
        'En dash – green',
        'Entity &#8211; blue',
    ];
    
    foreach ($tests as $test) {
        echo var_export(product_name_split($test, true)) . "n";
    }
    

    Output:

    array (
      0 => 'Tee-shirt',
      1 => 'product color',
    )
    array (
      0 => 'Boxer Welbar',
      1 => 'ligth grey',
    )
    array (
      0 => 'Longjohn Gari',
      1 => 'marine stripe',
    )
    array (
      0 => 'En dash',
      1 => 'green',
    )
    array (
      0 => 'Entity',
      1 => 'blue',
    )
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search