skip to Main Content

I’m working with 4 to 8 digit numeral strings, range 0001 to 99999999. Examples are:

  • 0010
  • 877565
  • 90204394

I need to check whether a numeral string can be formed out of a defined set. Think of it as a Scrabble bag of loose characters. The set contains:

  • 2 times 0 (00)
  • 4 times 1 (1111)
  • 3 times 2 (222)
  • 2 times 3 (33)
  • 3 times 4 (444)
  • 5 times 5 (55555)
  • 2 times 6 (66)
  • 5 times 7 (77777)
  • 2 times 8 (88)
  • 2 times 9 (99)

With this defined set of numerals, the string 0010 cannot be formed because it has 1 zero too many: it needs 3 but the set only provides 2. Outcome should be: false.

In contrast, the string 90204394 can be formed because the defined set provides a sufficient number of each numeral. It falls within parameters; desired output: true.

I thought to carry out the check by means of regex because that will return either true or false, which is perfect in this case. I came up with the following:

preg_match('/(0{0,2}1{0,4}2{0,3}3{0,2}4{0,3}5{0,5}6{0,2}7{0,5}8{0,2}9{0,2})/', $string);

Unfortunately I end up with the outcome that every tested string outputs true, even when it clearly cannot be formed; like 08228282 (as it contains one 8 and one 2 too many).

What am I missing here?

2

Answers


  1. Scott,

    It looks like regex is not what you’re looking for.
    This regex is returning true because it gets the number of ocurrences of each digit sequentially.
    For example, in case of the 08228282, at first, it gets the number of occurences of the digit 0, it happens 1 time, which is between 0 and 2 times ( {0,2} ), then it gets the digit 8, which happens only 1 time too and is true for {0,2} occurences. And the verification stops there, nothing else needs to be validated, because everything else can happen 0 times.
    Another example: 877565 it only validates the occurences of digit 8.

    I think the solution you need is not with regex, since you just need the total occurences of each digit.
    You should look forward for splitting the number in parts and count occurences. Try something like this:

    function validate($number) {
      //validate total length
      $count = strlen($number);
      if ($count < 4 or $count > 8) {
        return 'False'
      }
    
      // validate occurences of 0 (0 to 2 occurences)
      $count = substr_count($number, '0');
      if ($count > 2) {
        return 'False'
      }
    
      // validate occurences of 1
      $count = substr_count($number, '1');
      if ($count > 4) {
        return 'False'
      }
    
      // validate occurences of other digits
    ...
    
      // if nothing gets it, then its valid
      return 'True'
    }
    
    Login or Signup to reply.
  2. I don’t understand the format of your whitelisted number counts, but if it’s hardcoded like you’ve posted in your question, you can use full string lookaheads to validate the count of each whitelisted number. Demo

    $tests = [
        '0010',
        '877565',
        '90204394',
        '867530999',
        '1234567890',
        '6',
        '555555',
    ];
    
    $regex = <<<REGEX
    /
    (?=^(?:[^0]*0){0,2}[^0]*$)
    (?=^(?:[^1]*1){0,4}[^1]*$)
    (?=^(?:[^2]*2){0,3}[^2]*$)
    (?=^(?:[^3]*3){0,2}[^3]*$)
    (?=^(?:[^4]*4){0,3}[^4]*$)
    (?=^(?:[^5]*5){0,5}[^5]*$)
    (?=^(?:[^6]*6){0,2}[^6]*$)
    (?=^(?:[^7]*7){0,5}[^7]*$)
    (?=^(?:[^8]*8){0,2}[^8]*$)
    (?=^(?:[^9]*9){0,2}[^9]*$)
    /x
    REGEX;
    
    foreach ($tests as $test) {
        printf("%s => %sn", $test, preg_match($regex, $test) ? 'true' : 'false');
    }
    

    Output:

    0010 => false
    877565 => true
    90204394 => true
    867530999 => false
    1234567890 => true
    6 => true
    555555 => false
    

    If you are building your regex from an array of numbers and their counts, then you can use:

    $numMaxes = [2, 4, 3, 2, 3, 5, 2, 5, 2, 2];
    $regex = '';
    foreach ($numMaxes as $num => $max) {
        $regex .= "(?=^(?:[^$num]*$num){0,$max}[^$num]*$)";
    }
    $regex = "/$regex/";
    

    Alternatively, you could return false when a number’s max is exceeded and invert the boolean result shown in the first snippet. Demo

    $numMaxes = [2, 4, 3, 2, 3, 5, 2, 5, 2, 2];
    foreach ($numMaxes as $num => $max) {
        $subpatterns[] = "(?:[^$num]*$num){" . ($max + 1) . "}";
    }
    $regex = "/(?=" . implode('|', $subpatterns) . ")/";
    
    foreach ($tests as $test) {
        printf("%s => %sn", $test, preg_match($regex, $test) ? 'false' : 'true');
    }
    

    Or

    $numMaxes = [2, 4, 3, 2, 3, 5, 2, 5, 2, 2];
    $subpatterns = [];
    foreach ($numMaxes as $num => $max) {
        $subpatterns[] = "(?=(?:.*$num){" . ($max + 1) . "})";
    }
    $regex = "/" . implode('|', $subpatterns) . "/";
    

    Or with negated lookaheads for an exceeded count limit:

    $numMaxes = [2, 4, 3, 2, 3, 5, 2, 5, 2, 2];
    $regex = '';
    foreach ($numMaxes as $num => $max) {
        $regex .= "(?!(?:.*$num){" . ($max + 1) . "})";
    }
    $regex = "/^$regex.*$/";
    
    foreach ($tests as $test) {
        printf("%s => %sn", $test, preg_match($regex, $test) ? 'true' : 'false');
    }
    

    This can be performed without regex as well. For each input string, split the string into an array of numbers, count those numbers, filter out the non-violating elements, then check if that result is empty or not. Demo

    $tests = [
        '0010',
        '877565',
        '90204394',
        '867530999',
        '1234567890',
        '6',
        '555555',
        '08228282',
    ];
    
    $numLimits = [2, 4, 3, 2, 3, 5, 2, 5, 2, 2];
    var_export(
        array_map(
            fn($v) => !array_filter(
                array_count_values(str_split($v)),
                fn($count, $num) => $numLimits[$num] < $count,
                ARRAY_FILTER_USE_BOTH
            ),
            $tests
        )
    );
    

    I guess what I am saying is that there will be many ways to skin this cat.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search