I have the following string
$string = "5A3BB0020221209DES.ACT";
And running the following Regex
preg_match_all('/(00)|(?<!^)(?<date>2d{7}|d{2}.d{2}.d{2}|d{8}|d{6})/', $string, $m);
When I dump the output of $m[‘date’], I get an array like this
array(2) {
[0]=>
string(0) ""
[1]=>
string(8) "20221209"
}
I’m only wanting the second result. If I don’t have the match group for (00), or there simply isn’t a match for group (00), I don’t get this extra blank string. Why are other match groups polluting the date match group results with blank strings? I tried adding more match groups, and it added more blank strings to the results of date, for all the match groups that found matches. I could set my code to ignore all the extra blank matches, but this seems like it should be unnecessary. In the preg_match_all docs, I see this exact same behavior in the examples, but I didn’t see any explanation as to why or how to get rid of it.
2
Answers
You likely want to be using a non-capturing group, which is
(?:)
.Eg:
/(?:00)|(?<!^)(?<date>2d{7}|d{2}.d{2}.d{2}|d{8}|d{6})/
Although I am not sure that the expression does what you think it does. Eg: If the input contains
00
it will match that and only that.I would wager that the following is more what you might be after:
(?<!^)(?:00)?(?<date>(?:2d{7}|d{2}.d{2}.d{2}|d{8}|d{6}))
Which works out like:
Via: Debuggex
Because this string has two matches, "00" and "20221209".
You may not be aware that the alternation operator has the lowest precedence of all the regex operators. You probably wanted "00" OR Lookback to not the beginning, followed by what you’re interested in. Instead you got "00" is a complete match or not the beginning followed by 8 digits is a match.
I’m guessing what you really want is something like