skip to Main Content

For instance, say we’re looking at a query string, all-lowercase, all non-numeric, no special character (just [a-z] and =):

?some=querystring&ssembly=containing&n=indeterminate&mount=of&ll=potentially&ccordant=matches

Let us take as a given we know there will be three key-value pairs we wish to capture, and even that they are located at the beginning of said string:

  • some=querystring
  • ssembly=containing
  • n=indeterminate

Now, intuitively, it seems like I should be able to use something like…

^?(&?[a-z=]+){3}.*$

…or possibly…

^?(?:&?([a-z=]+)){3}.*$

…but, of course, the only capture this yields is

n=indeterminate

Is there a syntax that would allow me to capture all three groups (as independent, accessible values, natch) without having to resort to the following?

^?([a-z=]+)&([a-z=]+)&([a-z=]+).*$

I know there’s no way to capture n instances (an arbitrarily-large set), but, given this is a finite number of captures I wish to obtain from my finite automata…

I know full well there are any number of ways to accomplish this in Javascript, or any other language for that matter. I’m specifically trying to ascertain if I’m stuck with the WET expression above.

2

Answers


  1. There’s no recursion in EcmaScript regular expressions. Reference documentation is here, you’ll see there’s no recursion operator. You can also check regular-expressions.info; it tells which engines support recursion: Perl 5.10, PCRE 4.0, Ruby 2.0, Delphi, PHP, and R.

    Login or Signup to reply.
  2. JavaScript has no concept of recursion in its regex syntax, but the example you have given is not about recursion, but adjacent repetition of the same pattern.

    In that case I would suggest using a regex that just matches one occurrence of that pattern, but with the g flag, and use it with matchAll. This returns an iterator, and so you just consume the part that you need.

    If it is guaranteed that you will have three matches, you can do:

    const input = "?some=querystring&assembly=containing&n=indeterminate&mount=of&ll=potentially&ccordant=matches";
    const [[a],[b],[c]] = input.matchAll(/w+=w+/g);
    console.log(a, b, c);

    This is just an example that is targeting your example. As matchAll returns an iterator, you can use the power of JS to work with iterators (like a for loop, destructuring assignment, spread syntax, …etc).

    Alternative: dynamically built regex

    The repetitive nature of the regex you are troubled about can be taken over by the repeat() method:

    const input = "?some=querystring&assembly=containing&n=indeterminate&mount=of&ll=potentially&ccordant=matches";
    const regex = RegExp("([a-z]+=[a-z]+)&?".repeat(3));
    const [, ...matches] = input.match(regex);
    console.log(matches);
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search