Javascript - Unable to group parts of the regex together to make the intended operator precedence explicit

AmanChitransh
September 12, 2024
228 views
0 votes
2 Answers

I’m currently facing an issue with a regex that’s causing problems in SonarQube, which requires grouping to make the intended operator precedence explicit. After grouping the regex, it’s not working as expected.

SonarQube Issue:
SonarQube flags that the regex should have grouped parts to make the operator precedence clear.

Current Regex: /^(W)|([^a-zA-Z0-9_.]+)|(W)$/g
This regex is meant to validate a string based on the following conditions:

Requirements:

If the string contains dot(s) at the beginning or end, it should throw an error immediately.
If the string contains any symbols apart from A-Z, a-z, 0-9, underscore, or dot (where dots can only appear in between), it should throw an error.
The string should only contain A-Z, a-z, 0-9, underscore, or dots (dots can’t appear at the start or end but are allowed in between).

Note:
The existing logic is designed to throw an error if the regex matches. Therefore, I need a regex that negates the conditions mentioned above without modifying the existing logic, as it’s part of a reusable codebase.

I attempted the following regex /^(.)|([^a-zA-Z0-9_.]+)|(.*.$)/g, but I’m concerned this might still cause SonarQube issues due to operator precedence.

How can I properly structure this regex to meet these conditions and avoid SonarQube warnings?

Answers

- VincentF
- September 10, 2024 at 7:35 pm
- 0 votes
0
Regex:
```
/^.?([^p{L}_.rn]+).$|^.([^p{L}_.rn]+).?$/gmu
```
Explanation:
1. Anchors ^ and $:
  - ^ at the start ensures that the pattern matches from the beginning of the string.
  - $ at the end ensures that the pattern matches until the end of the string.
2. Optional starting dot (^.?):
  - .? matches an optional dot at the beginning of the string. This allows strings to start with a dot, but it is not required.
3. Character class in the middle ([^p{L}_.rn]+):
  - ([^p{L}_.rn]+) capture group that matches one or more characters that are not in the specified set:
    
    ^p{L} means anything that is not a character, it’s equivalent to a-zA-Z but it also excludes accented characters like é or ä.
    
    _ Excludes underscores.
    
    . Excludes dots.
    
    rn Optional (depends if you parse a text with many lines or only 1 line string): excludes newline and carriage return characters to prevent capturing line breaks.
4. Required ending dot (.$):
  - .$ ensures that the string ends with a dot.
5. Pipe | operator for alternation:
  - The regex uses |, which means OR, allowing for two different valid patterns:
    
    ^.?([^p{L}_.rn]+).$ Matches a string that optionally starts with a dot, has valid characters in the middle (excluding dots, underscores, letters), and ends with a dot.
    
    ^.([^p{L}_.rn]+).?$ Matches a string that starts with a dot, has valid characters in the middle, and optionally ends with a dot.
6. Options /gmu:
  - g : global to get all the matchs not just the first.
  - m : to match multi-line if the input text as many lines.
  - u : unicode to be able to use p{L}
Why not capturing 3 groups like in your example

Trying to capturing 3 groups meaning the middle and the two dots like below is bad because the number of capturing groups is not fixed (could be 2 or 3). And further in your code you will have to deal with this variable number of captured groups.

Anyway here is how you could do it:
```
/^(.)?([^p{L}_.rn]+)(.)$|^(.)([^p{L}_.rn]+)(.?)$/gmu
```
Tip

If the goal is to get only the whole match, just remove the capture groups:
```
/^.?[^p{L}_.rn]+.$|^.[^p{L}_.rn]+.?$/gmu
```
Tested with

Non matching examples
```
a!@#.
.!@_.
abc.def
!!@!
.é*"+.
```
Matching examples
```
.!@#$.
.!@#$
.@@.
.!#$.
```
Tests links

regex101
Login or Signup to reply.

- trincot
- September 12, 2024 at 12:34 pm
- 0 votes
0
Your current regex is correct: it will find a match when the input is not in line with the requirements.

The SonarQube warning you refer to is probably RSPEC-5850: Alternatives in regular expressions should be grouped when used with anchors

This rule tackles a common mistake that is made when combining ^ or $ with |. However, this is not a mistake that you have made. To make absolutely clear that you intended the ^ to only apply to the first alternative (and not all of them), and the $ to only apply to the last alternative (and not all of them), the suggestion here is to put ^ inside a group, and to do the same for $. Your current regex still leaves those out of the groups.

Note that you don’t really need to put the middle alternative in a group, as there you don’t use the ^ or $ assertions.

Secondly, the suggestion is not to make capture groups, but just groups. So use (?: ) instead of ( ), and make sure you put ^ and $ inside them.

Not related, but your regex doesn’t need the + quantifier. If one such character is found, it is enough. It doesn’t matter if you find more than one consecutive invalid character. Also, you can use w to shorten the character class.

Applying these changes, we get:
```
/(?:^W)|[^w.]|(?:W$)/g
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Javascript – Unable to group parts of the regex together to make the intended operator precedence explicit

Answers

Regex:

Explanation:

Why not capturing 3 groups like in your example

Anyway here is how you could do it:

Tip

Tested with

Non matching examples

Matching examples

Tests links