I am using PHP 7.4 as well as PHP 8.2 and I have a regex that I use in PHP to match words (names). To be completely honest, I barely recognize this regex monster I created. Thus this question is asking for assistance in figuring it out. It is basically this:
$is_word = preg_match('/^(?![aeiou]{3,})(?:D(?![^aeiou]{4,}[aeiou]*)(?![aeiou]{4,})){3,}$/i', $name);
I’ve been using it for about 6+ years to match names in a script I have created: It will basically return a boolean of TRUE
or FALSE
if it matches a word pattern.
But today it returned false
on two names which should be deemed valid:
- Li
- Drantch
To test this out, you can use the following batch of test names; using pseudo names for example sake:
- Nartinez
- Drantch
- Dratch
- Xtmnprwq
- Yelendez
- Boldberg
- Yelenovich
- Allash
- Mohamed
- Li
I attempted to adjust the regex to set the second {x,x}
to {5,}
$is_word = preg_match('/^(?![aeiou]{3,})(?:D(?![^aeiou]{5,}[aeiou]*)(?![aeiou]{4,})){3,}$/i', $name);
It helped in cases which match names like “Drantch” but then it still completely missed two-letter names like “Li.”
How can this regex be tweaked to properly match all names? If not all names, how can it be adjusted to properly match “Drantch” and other obvious names other that “Li.”
Note that, “Xtmnprwq” is a fake test name so I can test negatives as well as positives.
3
Answers
The
{3,}
in your non-capturing group mandates a minimum string length of 3 characters. If you want to allowLi
, reduce it to{2,}
.The negated characters class inside your negated lookahead (
(?![^aeiou]{4,}
) has a minimum qualification of 4 consonants, sontch
satisfies that and disqualifies the input string. If you want to allowDrantch
, increase it to(?![^aeiou]{5,}
.Code: (Demo)
Output:
As for improving your pattern’s readability, it would be better to express your exact intention, then generate a set of negated lookaheads before the "core" requirement that all characters must be letters and have a minimum character length.
Your regexp has the following constraints on words:
^(?![aeiou]{3,})
– Can’t begin with 3 or more consecutive vowels(?![^aeiou]{4,}
– Can’t have 4 or more consecutive consonants in the middle(?![aeiou]{4,})
– Can’t have 4 or more consecutive vowels in the middle{3,}
– Must be at least 3 characters longLi
violates the 3 characters requirement.Drantch
violates the 4 consecutive consonants restriction.Tweak or remove these bits of the regexp to changes the restrictions to allow these names.
To understand what you are doing, feel free to use visual tools like https://regex101.com/r/vICSfO/1
To allow us to help you, I recommend asking business logic, some practical case.
For example, your regex looks way complicated to me, but perhaps you need it exactly such for some reason.
At a first glance, it can be simplified:
At least, you need to replace
{3,}
by{2,}
if you need to match 2-characters words.