skip to Main Content

I need to match a string which contains dot (.) and no dot.

Here are the sample patterns:

res:shanghai:45610
res:chicago.usa:57450

Regex I’m using:

res:[w]{4,15}:[1-9][0-9]*

It matches only res:shanghai:45610, but it should match both.

Since the second pattern contains a dot (.) between chicago.usa, it doesn’t match.

How do I alter regex to match res:chicago.usa:57450 too. Specifically, it should allow a single dot in the middle of word characters, while still restricting the length of the field to a min 4 and a max 15 characters including the dot.

Above test case on regex101.

3

Answers


  1. Just add the dot to the character class:

    res:[.w]{4,15}:[1-9][0-9]*
    

    But it would also match strings with all dots.

    res:....:1
    
    Login or Signup to reply.
  2. Basically, you want the middle part to match both

    [w.]{4,15}
    

    and

    w+(?:.w+)?
    

    Lookaheads can be used to perform "AND" operations.

    res:(?=[w.]{4,15}:)w+(?:.w+)?:[1-9][0-9]*
    

    Another possibility is doing the length check outside of the regex.

    Login or Signup to reply.
  3. As mentioned, one can check the length outside of regex. Since the dot can show up only after a character and has to be followed by a character the pattern is extremely simple

    w+.?w+
    

    and altogether, for example

    if ( /res:(w+.?w+):[1-9][0-9]*/ and length $1 >= 4 and length $1 <= 15 )
    

    The one advantage of this is the pattern’s simplicity — that it avoids the advanced and complex combined use of lookaheads and consuming patterns.

    Since v5.32 we can also write the two conditions as a "chained comparison"

    if ( /res:(w+.?w+):[1-9][0-9]*/ and 4 <= length $1 <= 15 )
    

    Also see this discussed in perlop, Operator-Precedence-and-Associativity

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search