Javascript - How to make w token act non-greedy in this RegEx?

cristian
April 29, 2024
192 views
0 votes
2 Answers

I have a text string which contains a repeating pattern, each repetion separated by the next by the . (dot) character. The pattern may end in a _123 (underscore followed by a sequence of digits), and I want to catch those digits in a dedicated capturing group.

The RegEx (ECMAScript) I have built mostly works:
https://regex101.com/r/iEzalU/1

/(label(:|+))?(w+)(?:_(d+))?/gi

However, the (w+) part acts greedy, and overtakes the (?:_(d+))? part.

Adding a ? to make w+ non-greedy (w+?) works, but now I have a capturing token for each character matched by w

How can I make this regex such that w+ acts greedy but still does not overtake the _(d+) part?
Otherwise, is it possible to capture all tokens matched by the non-greedy w+?, as a single match? (some capturing/non-capturing groups magic?)

Answers

- WiktorStribiew
- April 29, 2024 at 5:51 pm
- 0 votes
0
When creating regular expressions, it is a good idea to think about your expected match boundaries.

You know you need to match substrings in a longer string, so $ and z can be excluded at once. Digits, letters, underscores are all word characters matched with w, so you want to match all up to a character other than a word character (or, potentially, till the end of string).

I suggest using
```
(label[:+])?(w+?)(?:_(d+))?b
```
See the regex demo

Details:
- (label[:+])? – an optional Group 1: label and then a : or +
- (w+?) – Group 2: one or more word chars as few as possible
- (?:_(d+))? – an optional sequence of: _ and then one or more digits captured into Group 3
- b – the next char can only be a non-word char or end of string should follow.
Login or Signup to reply.

- CarstenMassmann
- April 29, 2024 at 6:36 pm
- 0 votes
0
You can also get the desired result with a simpler regular expression that focuses on the finishing pattern of each group:
```
/(w+?)_(d+)(?:.|$)/gi
```
This may or may not be preceded by a label[+:] group, but that does not need to be expressed in the regular expression.

See the little demo I modified from Wiktor Stribizew’s example:

https://regex101.com/r/pO7OdW/1

Or as a snippet:
console.log([..."group_12.label:sub_1.field_23.label+long_field.label:another.label+long_field_345".matchAll(/(w+?)_(d+)(?:.|$)/gi)].map(r=>r.slice(1)))
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Javascript – How to make w token act non-greedy in this RegEx?

Answers