skip to Main Content

I am trying to build a feature for where I am going through messages between users and attempting to store all U.S. phone numbers that may have possibly been shared in the the message. I want to be very loose about the phone numbers that I store. To do that, I came up with the following regex in PHP (clear explanation given below)

"/(?:+?1[.s-]*)?(?:(?d{1,3})[.s-]*)?(?:d{3}[.s-]+)(?:d{4}[.s-]*)(?:(ext|ext.|Ext|Ext.|extension|Extension)?[.s-]*d{1,6})?|(?:+?1?d{10})/",

(?:+?1[.s-]*)?: This part handles an optional country code (+1) with an optional separator (dot, space, or hyphen). It’s optional because I want to capture phone numbers without the country code as well

(?:(?d{1,3})[.s-]*)?: This part handles an optional area code enclosed in parentheses

(?:d{3}[.s-]+): This part matches the first three digits of the phone number followed by a separator (can be ‘.’ ‘-‘ or spaces)

(?:d{4}[.s-]*): This part matches the next four digits of the phone number followed by an optional separator (can be ‘.’ ‘-‘ or spaces)

(?:(ext|ext.|Ext|Ext.|extension|Extension)?[.s-]*d{1,6})?: This part captures optional extensions (case-insensitive) with an optional separator and up to six digits.

|: This is an alternation operator, allowing the regular expression to match either the pattern before or after it.

(?:+?1?d{10}): This part handles an alternative pattern for phone numbers without explicit separators, where there could be an optional country code (+1) and 10 digits.

However, this regex is a match for the following string

+44 20 7123 4567 where 123 4567 is the match

What should I use to avoid capturing this?

2

Answers


  1. Not sure, if this mtaches all your cases, but if you add (?!+d{0,2}[^1]) at the beginning, you can ensure that the string doesn’t start with a + symbol followed by up to 2 digits and a character other than 1.

    Login or Signup to reply.
  2. It might be possible inside the regular expression, but why not just filter the result in PHP? Not everything has to be solved with a single regular expression.

    One problem here is that a look behind assertion (aka "(not) prefixed by …") needs to have a fixed length – but a country code can have different lengths.

    I would suggest matching any possible phone number. This would consume characters otherwise matched by partial matches. Then iterate the matches
    and use a specific pattern to match an US Phone number in any variant you require.

    Note: In the following example I am using the x (Extended) modifier. This allows to format, indent and comment the pattern.

    $patternPhoneMaybe = '(
        # optional prefixing +
        +?
        # digits and separator characters
        (?:
          d+[- .]*
          |
          (d+)[- .]*
        )+
        # optional extension  
        (?:
          (?:[eE]xt(?:[.]|ension)s*)
          d{1,6} 
        )? 
    )x';
    
    if (preg_match_all($patternPhoneMaybe, getData(), $matches)) {
        $filtered = array_filter(
            array_map(fn($match) => parseNumberMatch($match), $matches[0]),
            'is_array'
        );
        var_dump($filtered);
    }
    
    
    function parseNumberMatch(string $input): ?array {
    
        $patternPhoneUS = '(
            ^
            # optional country code
            (?<country>(?:00|+)1)? 
            # optional separator
            [- .]? 
             # area code
            (?<area>(d{3})|d{3})
            # optional separator
            [- .]? 
            # 7 digit phone number with optional separator
            (?<number>
              d{3}
              [- .]?
              d{4}
            )
            # optional extension  
            (?: 
              [- ] # mandatory separator
              (?:[eE]xt(?:[.]|ension)s*)?
              (?<extension>d{1,6}) 
            )?
            $
        )x';
    
        if (preg_match($patternPhoneUS, trim($input), $match)) {
            return $match;
        }
        
        return null; 
        
    }
    
    
    function getData() {
    return <<<'TEXT'
    US
    
    +1 718 123 4567
    +1 (718) 123-4567
    +17181234567
    001 (718) 123 4567
    
    Other Country
    
    +23 (718) 123 4567
    0023 (718) 123 4567
    
    +1 (718) 123 4567-1
    +1 (718) 123 4567-123456
    +1 (718) 123 4567 ext.1
    +1 (718) 123 4567 Extension 1
    
    US Variants
    
    2124567890
    212-456-7890
    (212)456-7890
    (212)-456-7890
    212.456.7890
    212 456 7890
    +12124567890
    +12124567890
    +1 212.456.7890
    
    Other Numbers
    
    718
    123.45
    
    TEXT;
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search