skip to Main Content

I’m trying (and failing) to write a JavaScript regular expression to match a PostgreSQL operator name, defined in the docs as:

An operator name is a sequence of up to NAMEDATALEN-1 (63 by default) characters from the following list:

+ - * / < > = ~ ! @ # % ^ & | ` ?

There are a few restrictions on operator names, however:

  • — and /* cannot appear anywhere in an operator name, since they will be taken as the start of a comment.

  • A multiple-character operator name cannot end in + or -, unless the name also contains at least one of these characters:

    ! @ # % ^ & | ` ?

    For example, @- is an allowed operator name, but *- is not. This restriction allows PostgreSQL to parse SQL-compliant queries without requiring spaces between tokens.

Other than the Regex-noob:

[+-*/<>=~!@#%^&|`?]+

I’m failing hard to get any further, especially the "-- and /* can’t appear in the matched string" requirements. I’ve tried wrestling Regular expression to match a line that doesn't contain a word into it, but with no success.

Is this even possible all in a single Regex? (it’s to work within the https://github.com/no-context/moo so no opportunity fallback to regular JS).

Any pointers very gratefully received!

I’ve tried things like (?!.*--)[+-*/<>=~!@#%^&|?]+`, but to no avail.

2

Answers


  1. First of all, in a character class none of these except - (and ^, if it were at the beginning) need to be backslash-escaped, so you can simplify to

    /[+-*/<>=~!@#%^&|`?]+/
    

    To avoid matching comments, the easiest solution is lookahead:

    /(?:(?!--|/*)[+-*/<>=~!@#%^&|`?])+/
    

    but you can also spell it out to something like "- followed by a valid character except -, or / followed by a valid character except -, or valid character except - or /", but that gets complicated quickly with repetition still not forming forbidden patterns.

    For "cannot end in + or -, unless it also contains at least one […]", you can also use lookaround:

    /(?=.*[~!@#%^&|`?]|.*[^+-]$)(?:(?!--|/*)[+-*/<>=~!@#%^&|`?])+/
    

    But given you want to use this for a lexer, the $ anchor probably does not work so you’ll need to use

    /(?:(?!--|/*)[+-*/<>=~!@#%^&|`?])+(?<![+-])|[+-]|(?:(?!--|/*)[+-*/<>=~!@#%^&|`?])*[~!@#%^&|`?](?:(?!--|/*)[+-*/<>=~!@#%^&|`?])*/
    
    Login or Signup to reply.
  2. This regex will capture operator names from a multi-line string.

    /(?:^|[t ])(?!S*?(?:--|/*))(?![+-*/<>=]+[+-](?=[ n]))([+-*/<>=~!@#%^&|`?]{1,63})/gm
    

    Breakdown:

    (?:^|[t ])           <- Operator name is at start of line or is preceded by whitespace.
    ((?!S*?(?:--|/*))  <- It may not contain "--" or "/*"
    (?![+-*/<>=]+[+-](?=[ n]))  
                          <- May not end in "+" or "-" if missing certain characters.
                             Word boundaries don't work with these special characters, so 
                             we detect by looking for space or newline following our name.
    ([+-*/<>=~!@#%^&|`?]{1,63})
                          <- Contains a sequence of 1 to 63 of a set of allowed characters.
    

    Tested against the following data:

    [GOOD]
    -
    + +
    * *<
    / < > >*
    = ~ !
    @ # % ^
    & | < >
    *->
    [BAD]
    --
    /*
    +--
    @/*
    *-
    <+
    ++
    [MIX]
    !!@ #/*
    #/* !!@
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search