i have a mongoDdb collection of about 10 billion documents. I want to be able to search in the collection using an $or query but gives priority over the conditions.
here is my existing code :
const prefixChar = 'A';
const latterword = 'XYZ';
await data.find({
$or: [
{
prefix: prefixChar,
number: { $exists: true },
latter: latterword
},
{
prefix: { $exists: true },
number: { $exists: true },
latter: latterword
},
{
prefix: prefixChar,
number: { $exists: true },
latter: { $regex: `^${latterword[0]}${latterword[1]}` }
},
{
prefix: prefixChar,
number: { $exists: true },
latter: { $regex: `^${latterword[0]}` }
},
{
prefix: prefixChar,
number: { $exists: true },
latter: { $exists: true }
}
]
}).limit(12);
i have also tried to execute the queries one by one. it takes too much time. i need the response time between 0 to 2000 milliseconds. i have already put an index on those three fields and tried parallel execution of a query in Python. but with billions of records, it’s still too slow.
2
Answers
You can achieve this with some Boolean reduction.
1. For starters, since
number: { $exists: true }
is in all the criteria, separate that from $or clauses and use $and to include it. Like:2. In the 3rd & 4th clauses, the difference is only
vs
So the 4th case already covers the 3rd case and is more broad, so the 3rd case can be removed. Now, your remaining query is this.
3. For a regex to match, or for a string to match – as required by the remaining clauses – the field has to exist. So the
exists
requirements can be moved into the firstnumber-exists
check anyway.Now the you’re left with these in the
$or
part:4. Excluding the regex, that means either
prefix
matches orlatter
matches, or both but that’s covered in the either case already:So finally it’s:
Also, you should
sort
when usinglimit
; otherwise results will be in a random order every time you execute it.From the scale of the dataset, I am guessing it is likely to be some datalake collection that might have a high read-write ratio. i.e. read frequently but not updated frequently. If that is the case and your parameter does not change frequently, you may consider precomputing the priority and store each execution’s results in a separate collection such that you can refer the same execution result in future.
Mongo Playground
Afterwards, when you are trying to fetch the result, you can start a
$lookup
from yourprecomputed
collection. The precomputed collection can be indexed to boost the performance.Mongo Playground