I want to detect invalid boolean queries that do not use parentheses. Specifically, I want to detect cases where multiple operators are used in a boolean query without parentheses, which can lead to ambiguous or incorrect results.
For example, consider the following boolean query:
love AND family OR trees
The user could mean (love AND family) OR trees
or love AND (family OR trees)
, and the results would be different. To avoid such ambiguity, I want to give the user an error message when they try to do searches like this.
More Examples:
These queries do not need parentheses and are valid:
love AND family AND trees
,
love OR family OR trees OR peace
,
love NOT family NOT peace NOT peace
,
love OR family OR (ash AND peace)
,
(love OR family OR ash) AND peace
The below queries need parentheses and are invalid:
love AND family OR trees
,
love NOT family OR trees
,
love AND family NOT trees
,
love AND family OR war AND peace
,
(love OR family OR ash) AND peace OR ash
How can I detect invalid boolean queries without parentheses?
3
Answers
A simple way to solve this would be to define a formal grammar for your language, feed it to a parser generator like peg.js and include the generated parser in your project. Eventually, you’re going to need a parser anyways, to evaluate your expressions.
One possible grammar for your language could be like this ((you can try it on https://pegjs.org/online)
This grammar accepts your valid examples and rejects invalid ones.
Assuming that the names only consist of letters, numbers and
_
, here’s a regex-based solution:Explanation:
First, from observation, we can see that valid queries follow a pattern:
This sounds like a thing for regex:
Since JS regex doesn’t support recursion, we have to polyfill that on our own. For a query to be valid, all parenthesized expressions in it also have to be valid. We replace them with a valid unused name like
_
if they are valid; otherwise, we omit it entirely so that the regex test will returns false.If at least one expression is invalid, the query will be something like this, which won’t match the final regex test:
Try it:
If your question only aims at identifying the cases of AND and OR at same level, you may specify a recursive procedure,i.e checkAndOr(expression) :
(each term starts at first opening parenthesis and ends at corresponding
closure.