skip to Main Content

I have a requirement to validate input using regex .
Requirement is to match the string in form of tuples(a,b,c) or more than 3 (a,b,c,d,e) but white space can occur before/after string boundaries
like below:

t1
t2,t1
 t1 , t2 
 a    
 a,b

Following are invalid

a,
,
<empty>

I came with this regex:

(s*(w+s*,s*)*s*w+s*)

The matching works fine but it has polynomial complexity for Attack string ‘t’.repeat(1651) + ‘t’.repeat(1651) + ‘,0’

I consider input as matching if the main group equals to input string. I mean I would reject inputs like a, although it matched subgroup.

Any suggestions to make it safe/linear. tried lookahead approach and lazy quantifiers but could not get it right?

Once I make this safe expression, end goal is to add a prefix/suffix and make it safe

something like

    PREFIX (s*(w+s*,s*)*s*w+s*) SUFFIX

I was trying something like this but it stops matching correct inputs

PREFIX(?=(?(s*(w+s*,s*)sw+s*)))k SUFFIX)

With above even correct inputs like below are also not matched

PREFIX a,b SUFFIX

Thanks..

2

Answers


  1. This seems like the simplest solution:

    (s*w+s*,?)*
    

    The pattern is: any whitespace, followed by one or more words, followed by any whitespace, followed by one or zero commas. The pattern may appear 0 or many times.

    That site says this is linear and safe.

    This will do the same thing as the regex you provided, except that it allows a trailing comma and also matches the empty string.

    If that doesn’t do what you need, then please provide a more detailed description and more examples of what should pass or fail validation.

    Login or Signup to reply.
  2. You could update the pattern by starting to match 1+ word characters, and then optionally repeat a comma between optional whitespace chars and then 1+ word characters.

    Note that using s can also match newlines.

    PREFIX w+(?:s*,s*w+)* SUFFIX
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search