skip to Main Content

I have a calculator and I want it to not start calculation until the expression seems to be correct. Seems to be correct means that it should not contain any non math symbols like $, # etc.
I don’t care about logical validity as paretheses balancing or missing operands, just invalid characters.
I use server-client approach.
To accomplish this I want to use regex (it could be provided with list of available operations).

For example:

  • 3 + 10 – correct
  • tan(45 * PI / 180) – correct
  • 5 % 10 – correct
  • 3 + # – incorrect
  • 3 + correct
  • 5 + 3 * ( 2 – also correct, symbols are perfectly valid

I tried to use regex that uses available operations’ symbols, but here some complications I encountered:

  1. Operation’s symbol’s length can vary. It could be either one symbol or a function name, therefore it’s needed somehow to split apart those two cases in order to make regex work corerct.
    I was using groups: [+-tan] will not work as intented, because it will match any letter from tan, but I need to only match the whole tan part.
  2. As for me, depending on available operations seems to be not such a good idea, I need more general way to test expression in case I want to use it elsewhere.
  3. The main problem with my regex was that should it encounter only one character from regex and then it will tell that it’s correct despite of possible invalid following characters.

2

Answers


  1. don’t do this, regex is not a Lexer, you should use one of those instead
    see this for more info : https://stackoverflow.com/a/1732454/21517472

    willy

    Login or Signup to reply.
  2. As far as I understood you’re only interested in checking that individual characters are in a closed set of valid characters.

    From the examples you’ve given that set of characters consists of:

    • digits, and point (for decimal separator)
    • operators: +-*/%
    • letters from the English alphabet so to be used in tan, PI, …etc (in regex w, also covering digits and underscore)
    • Parentheses
    • white space (in regex s)

    Combining that we get this character class in regex syntax:

    [ws.()+*/%-]
    

    NB: put the - as last so it doesn’t get interpreted as a range separator. And escape the / with a backslash so in JavaScript it isn’t interpreted as the end of a regex literal.

    So to validate, you could check whether there is any character in the input that is not in this class (using [^). If so, reject the input.

    const isInvalid = s => /[^ws.()+*/%-]/.test(s);
    
    const tests = [
        "3 + 10",
        "tan(45 * PI / 180)",
        "5 % 10",
        "3 + #",
        "3 +",
        "5 + 3 * ( 2 -",
    ];
    
    for (const test of tests) {
        console.log(test, isInvalid(test) ? "invalid" : "correct");
    }

    Parsing

    It is clear that the above does not prevent the user from entering invalid expressions. Expressions have (recursive) grammar rules that cannot be checked by regex (alone).

    Yet, you could do that with a parser. Just for reference, here is one I wrote in answer to another question: it allows you to define the list of operators and functions you want to support, and it either returns the calculated value or throws an error when the syntax is not correct.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search