Php - Loose Regex for capturing Phone number

DavidPham
March 7, 2024
151 views
0 votes
2 Answers

I am trying to build a feature for where I am going through messages between users and attempting to store all U.S. phone numbers that may have possibly been shared in the the message. I want to be very loose about the phone numbers that I store. To do that, I came up with the following regex in PHP (clear explanation given below)

"/(?:+?1[.s-]*)?(?:(?d{1,3})[.s-]*)?(?:d{3}[.s-]+)(?:d{4}[.s-]*)(?:(ext|ext.|Ext|Ext.|extension|Extension)?[.s-]*d{1,6})?|(?:+?1?d{10})/",

(?:+?1[.s-]*)?: This part handles an optional country code (+1) with an optional separator (dot, space, or hyphen). It’s optional because I want to capture phone numbers without the country code as well

(?:(?d{1,3})[.s-]*)?: This part handles an optional area code enclosed in parentheses

(?:d{3}[.s-]+): This part matches the first three digits of the phone number followed by a separator (can be ‘.’ ‘-‘ or spaces)

(?:d{4}[.s-]*): This part matches the next four digits of the phone number followed by an optional separator (can be ‘.’ ‘-‘ or spaces)

(?:(ext|ext.|Ext|Ext.|extension|Extension)?[.s-]*d{1,6})?: This part captures optional extensions (case-insensitive) with an optional separator and up to six digits.

|: This is an alternation operator, allowing the regular expression to match either the pattern before or after it.

(?:+?1?d{10}): This part handles an alternative pattern for phone numbers without explicit separators, where there could be an optional country code (+1) and 10 digits.

However, this regex is a match for the following string

+44 20 7123 4567 where 123 4567 is the match

What should I use to avoid capturing this?

Answers

- ValentinMarolf
- March 6, 2024 at 10:52 pm
- 0 votes
0
Not sure, if this mtaches all your cases, but if you add (?!+d{0,2}[^1]) at the beginning, you can ensure that the string doesn’t start with a + symbol followed by up to 2 digits and a character other than 1.

Login or Signup to reply.

It might be possible inside the regular expression, but why not just filter the result in PHP? Not everything has to be solved with a single regular expression.

One problem here is that a look behind assertion (aka "(not) prefixed by …") needs to have a fixed length – but a country code can have different lengths.

I would suggest matching any possible phone number. This would consume characters otherwise matched by partial matches. Then iterate the matches
and use a specific pattern to match an US Phone number in any variant you require.

Note: In the following example I am using the x (Extended) modifier. This allows to format, indent and comment the pattern.

$patternPhoneMaybe = '(
    # optional prefixing +
    +?
    # digits and separator characters
    (?:
      d+[- .]*
      |
      (d+)[- .]*
    )+
    # optional extension  
    (?:
      (?:[eE]xt(?:[.]|ension)s*)
      d{1,6} 
    )? 
)x';

if (preg_match_all($patternPhoneMaybe, getData(), $matches)) {
    $filtered = array_filter(
        array_map(fn($match) => parseNumberMatch($match), $matches[0]),
        'is_array'
    );
    var_dump($filtered);
}


function parseNumberMatch(string $input): ?array {

    $patternPhoneUS = '(
        ^
        # optional country code
        (?<country>(?:00|+)1)? 
        # optional separator
        [- .]? 
         # area code
        (?<area>(d{3})|d{3})
        # optional separator
        [- .]? 
        # 7 digit phone number with optional separator
        (?<number>
          d{3}
          [- .]?
          d{4}
        )
        # optional extension  
        (?: 
          [- ] # mandatory separator
          (?:[eE]xt(?:[.]|ension)s*)?
          (?<extension>d{1,6}) 
        )?
        $
    )x';

    if (preg_match($patternPhoneUS, trim($input), $match)) {
        return $match;
    }
    
    return null; 
    
}


function getData() {
return <<<'TEXT'
US

+1 718 123 4567
+1 (718) 123-4567
+17181234567
001 (718) 123 4567

Other Country

+23 (718) 123 4567
0023 (718) 123 4567

+1 (718) 123 4567-1
+1 (718) 123 4567-123456
+1 (718) 123 4567 ext.1
+1 (718) 123 4567 Extension 1

US Variants

2124567890
212-456-7890
(212)456-7890
(212)-456-7890
212.456.7890
212 456 7890
+12124567890
+12124567890
+1 212.456.7890

Other Numbers

718
123.45

TEXT;
}

Please signup or login to give your own answer.

Click here to cancel reply.

Php – Loose Regex for capturing Phone number

Answers