inefficient regular expression in javascript

bionicsparv
August 10, 2023
74 views
1 vote
2 Answers

Hi in our below code using codeql scanning got an alert that "This part of the regular expression may cause exponential backtracking on strings starting with ‘0’ and containing many repetitions of ‘0’."

const validateUrl = str => {
  var pattern = new RegExp(
    "^(https?:\/\/)?" + // protocol
    "((([a-z\d]([a-z\d-]*[a-z\d])*)\.)+[a-z]{2,}|" + // domain name

please let me know how can we resolve this and what exactly the problem?

Answers

- NoName
- August 10, 2023 at 12:23 pm
- 0 votes
0
The problem is that the regular expression you’re using has a pattern that can cause exponential backtracking on certain strings. This means that for some strings, the regular expression engine may end up trying a huge number of possible paths through the pattern, resulting in slow performance and a possible denial of service if an attacker can control the input string.

The part of the expression that’s triggering this warning is likely this bit:
```
([a-z\d]([a-z\d-]*[a-z\d])*)
```
This pattern is trying to match a series of alphanumeric characters and dashes, but the way it’s structured can cause issues. The nested repetition (* inside of another *) is what creates the risk of exponential backtracking.

You can usually rewrite the pattern to avoid the nested repetition. In this case, you might rewrite the pattern to something like this:
```
const validateUrl = str => {
  var pattern = new RegExp(
    "^(https?:\/\/)?" + // protocol
    "((([a-z\d]+[-])*[a-z\d]+\.)+[a-z]{2,}|" // domain name
    // ... rest of pattern
  );
  // ... rest of code
}
```
Here, I’ve changed the pattern to use ([a-z\d]+[-])* to match one or more alphanumeric characters followed by a dash, repeated zero or more times, and then followed by [a-z\d]+, which matches one or more alphanumeric characters.

This new pattern should still match valid domain names like the old pattern but without the risk of exponential backtracking. You may want to test it on various inputs to make sure it still behaves as intended.
Login or Signup to reply.

- Luatic
- August 10, 2023 at 1:28 pm
- 0 votes
0
As the other answer has correctly identified, the problem is ([a-zd-]*[a-zd])*. This RegEx recognizes a superset of ([a-zd]+)*. The nesting of the two quantifiers gives a naive backtracking RegEx implementation too many options to try: Consider a long string consisting of a‘s (or 0‘s, or any other character in [a-zd-]) followed by $. The RegEx engine will first try matching a single a, then matching further according to the *. If this fails, it will later try matching two a‘s and so on. The same will be repeated afterwards. Ultimately your RegEx engine will have tried all options of partitioning the a‘s into substrings of length > 0, of which there are superexponentially many.

Note that the enclosing (....)+ is not an issue, since the dot serves as a delimiter.

Your first option, which wouldn’t require changing the RegEx at all, would simply be to switch to a RegEx engine with guaranteed linear time complexity. This flaw is only exhibited by backtracking RegEx engines and not by RegEx engines which use DFAs (Deterministic Finite Automatons). There seem to be NPM packages which can convert RegEx to DFA.

Your second option is to rewrite the RegEx. [a-zd]([a-zd-]*[a-zd])* matches any sequence of alphanumerics and dashes as long as the dashes aren’t at the end or beginning; thus we can convert the outer * quantifier into a ? quantifier: [a-zd]([a-zd-]*[a-zd])?.
Alternatively, we could also write it as [a-zd]+(-+[a-zd]+)*. This nests quantifiers, but the dashes serve as a delimiter, so this shouldn’t exhibit catastrophic backtracking either.

You’re not showing the full RegEx, but it is most likely insufficient to validate URLs. A proper, spec-compliant RegEx to validate URLs is rather lengthy. Take a look at the syntax diagram on Wikipedia; you’re missing entire parts of it (like userinfo, port, query, fragment). Peter Seliger’s comment is spot on:

The OP might consider delegating such a validation task to an Web-Api URL compatible node implementation.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.