I’m trying (and failing) to write a JavaScript regular expression to match a PostgreSQL operator name, defined in the docs as:
An operator name is a sequence of up to
NAMEDATALEN
-1 (63 by default) characters from the following list:+ - * / < > = ~ ! @ # % ^ & | ` ?
There are a few restrictions on operator names, however:
— and /* cannot appear anywhere in an operator name, since they will be taken as the start of a comment.
A multiple-character operator name cannot end in + or -, unless the name also contains at least one of these characters:
! @ # % ^ & | ` ?For example, @- is an allowed operator name, but *- is not. This restriction allows PostgreSQL to parse SQL-compliant queries without requiring spaces between tokens.
Other than the Regex-noob:
[+-*/<>=~!@#%^&|`?]+
I’m failing hard to get any further, especially the "--
and /*
can’t appear in the matched string" requirements. I’ve tried wrestling Regular expression to match a line that doesn't contain a word into it, but with no success.
Is this even possible all in a single Regex? (it’s to work within the https://github.com/no-context/moo so no opportunity fallback to regular JS).
Any pointers very gratefully received!
I’ve tried things like (?!.*--)[+-*/<>=~!@#%^&|
?]+`, but to no avail.
2
Answers
First of all, in a character class none of these except
-
(and^
, if it were at the beginning) need to be backslash-escaped, so you can simplify toTo avoid matching comments, the easiest solution is lookahead:
but you can also spell it out to something like "
-
followed by a valid character except-
, or/
followed by a valid character except-
, or valid character except-
or/
", but that gets complicated quickly with repetition still not forming forbidden patterns.For "cannot end in
+
or-
, unless it also contains at least one […]", you can also use lookaround:But given you want to use this for a lexer, the
$
anchor probably does not work so you’ll need to useThis regex will capture operator names from a multi-line string.
Breakdown:
Tested against the following data: