skip to Main Content

My text as belows:

9/91 a1 2a cx.papaya 94000
9/92 b2 3a x44b mango 10220
9/93 3 3a x333 pineapple
9/94 x4 cx.apple 94000
9/95 5 55 cyz cx.orange

I try to develop a regex to find out the word as below table but it’s not working.

My regex is ^[0-9/]+.*s(.*)s(d{5})$.

This is my expectation:

Group 1 Group 2 Group 3
9/91 a1 2a papaya 94000
9/92 b2 3a x44b mango 10220
9/93 3 3a x333 pineapple
9/94 x4 apple 94000
9/95 5 55 cyz orange

4

Answers


  1. Probably something like this might help:

    (d/d+s.d).+(papaya) ?(d+)?
    

    you can try to play around with regexp on some sites like https://regex101.com/

    Login or Signup to reply.
  2. Here is my attempt:

    ^(d+/d+hxd+)h(?:w+.)?(w+)h?(d+)?$
    

    Demo: regex101

    Explanation:

    • ^: start anchor
    • (d+/d+hxd+): first capturing group, match pattern 9/91 x1 (one or many digits d+, a slash with escape character /, one or many digits d+, a space h, character x, one or many digits d+)
    • h(?:w+.)?: a space h followed by a non capturing group that match optional pattern cx.
    • (w+): second capturing group, match any words characters w+ one or many times
    • h?(d+)? third capturing group (which is optional), a optional space h?, optional capturing group (d+)?
    • $: end anchor

    Update: OP changed their question so this is my new attempt:

    Thanks @The fourth bird for remove trailing space in the third capturing group

    ^(d+/d+(?:hw+)+)h(?:w+.)?([a-zA-Z]+)(?:h(d+))?$
    

    Demo: regex101

    • I added (?:hw+)+ to the first capturing group to match multiple characters group like a1 2a after 9/91 pattern
    • I changed the second capturing group from w+ to [a-zA-Z]+ to match only word character.
    Login or Signup to reply.
  3. You forgot to create a group for the first part and to account for the x sequence. You should also make the last part optional and account for the leading optional prefix in your second part. The result of those changes could look like this:

    ^([0-9/]+ xd) (?:w+.)?(w+)(?: (d{5}))?$
    

    You can add the lazy group (?: w+)+? to reflect the additional trailing sequence to the first group in your changed question:

    ^([0-9/]+(?: w+)+?) (?:w+.)?(w+)(?: (d{5}))?$
    
    Login or Signup to reply.
  4. Since you tagged also php i will provide a php solution without a regex for your problem so you can also check it out as an alternative.

       <?php
    $input = '9/93 3 3a x333 pineapple';
    
    $splitter = explode(' ',$input);
    
    $maxArrayPositions = array_key_last($splitter);
    
    $group3 = '';
    $group1 = '';
    if(is_numeric(end($splitter))){
        $group3 = end($splitter);
        $fruit = explode('.',$splitter[$maxArrayPositions-1]);
        $group2 = end($fruit);
        $counter = 0;
        while($counter < $maxArrayPositions -1){
            $group1 = $group1. ' ' .$splitter[$counter];
            $counter ++;
        }
    }
    else{
        $fruit = explode('.',end($splitter));
        $group2 = end($fruit);
        $counter = 0;
        while($counter < $maxArrayPositions){
            $group1 = $group1. ' ' .$splitter[$counter];
            $counter ++;
        }
    }
    
        
    echo 'group 1 is '. $group1. "n";
    echo 'group 2 is '. $group2. "n";
    echo 'group 3 is '. $group3. "n";
    

    The output of the group variables is as expected.

    Basically your strings have a pattern.

    • the last column is optional and can be a number or empty
    • the before-last column is your fruit with or without a prefix but you only want the part after the last dot
    • everything before the semi-last column is your concatenated string.

    I hope it helps you.

    (updated my answer based on your latest input change in your question)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search