I have a requirement to validate input using regex .
Requirement is to match the string in form of tuples(a,b,c) or more than 3 (a,b,c,d,e) but white space can occur before/after string boundaries
like below:
t1
t2,t1
t1 , t2
a
a,b
Following are invalid
a,
,
<empty>
I came with this regex:
(s*(w+s*,s*)*s*w+s*)
The matching works fine but it has polynomial complexity for Attack string ‘t’.repeat(1651) + ‘t’.repeat(1651) + ‘,0’
I consider input as matching if the main group equals to input string. I mean I would reject inputs like a, although it matched subgroup.
Any suggestions to make it safe/linear. tried lookahead approach and lazy quantifiers but could not get it right?
Once I make this safe expression, end goal is to add a prefix/suffix and make it safe
something like
PREFIX (s*(w+s*,s*)*s*w+s*) SUFFIX
I was trying something like this but it stops matching correct inputs
PREFIX(?=(?(s*(w+s*,s*)sw+s*)))k SUFFIX)
With above even correct inputs like below are also not matched
PREFIX a,b SUFFIX
Thanks..
2
Answers
This seems like the simplest solution:
The pattern is: any whitespace, followed by one or more words, followed by any whitespace, followed by one or zero commas. The pattern may appear 0 or many times.
That site says this is linear and safe.
This will do the same thing as the regex you provided, except that it allows a trailing comma and also matches the empty string.
If that doesn’t do what you need, then please provide a more detailed description and more examples of what should pass or fail validation.
You could update the pattern by starting to match 1+ word characters, and then optionally repeat a comma between optional whitespace chars and then 1+ word characters.
Note that using
s
can also match newlines.