skip to Main Content

i have a mongoDdb collection of about 10 billion documents. I want to be able to search in the collection using an $or query but gives priority over the conditions.

here is my existing code :

const prefixChar = 'A';
const latterword = 'XYZ';

await data.find({
  $or: [
    {
      prefix: prefixChar,
      number: { $exists: true },
      latter: latterword
    },
    {
      prefix: { $exists: true },
      number: { $exists: true },
      latter: latterword
    },
    {
      prefix: prefixChar,
      number: { $exists: true },
      latter: { $regex: `^${latterword[0]}${latterword[1]}` }
    },
    {
      prefix: prefixChar,
      number: { $exists: true },
      latter: { $regex: `^${latterword[0]}` }
    },
    {
      prefix: prefixChar,
      number: { $exists: true },
      latter: { $exists: true }
    }
  ]
}).limit(12);

i have also tried to execute the queries one by one. it takes too much time. i need the response time between 0 to 2000 milliseconds. i have already put an index on those three fields and tried parallel execution of a query in Python. but with billions of records, it’s still too slow.

2

Answers


  1. You can achieve this with some Boolean reduction.

    1. For starters, since number: { $exists: true } is in all the criteria, separate that from $or clauses and use $and to include it. Like:

    data.find({
      $and: [
        { number: { $exists: true } },
        {
          $or: [
            {
              prefix: prefixChar,
              latter: latterword
            }
            ...  // all the other criteria here
          ]
        }
      ]
    })
    

    2. In the 3rd & 4th clauses, the difference is only

    latter: { $regex: `^${latterword[0]}${latterword[1]}` }
    

    vs

    latter: { $regex: `^${latterword[0]}` }
    

    So the 4th case already covers the 3rd case and is more broad, so the 3rd case can be removed. Now, your remaining query is this.

    3. For a regex to match, or for a string to match – as required by the remaining clauses – the field has to exist. So the exists requirements can be moved into the first number-exists check anyway.

    Now the you’re left with these in the $or part:

    {
      $or: [
        {
          prefix: "prefixChar",
          latter: "latterword"
        },
        { latter: "latterword" },
        {
          prefix: "prefixChar",
          latter: { $regex: `^${latterword[0]}` }
        },
        { prefix: "prefixChar" }
      ]
    }
    

    4. Excluding the regex, that means either prefix matches or latter matches, or both but that’s covered in the either case already:

    {
      $or: [
        { latter: "latterword" },
        {
          prefix: "prefixChar",
          latter: { $regex: `^${latterword[0]}` }
        },
        { prefix: "prefixChar" }
      ]
    }
    

    So finally it’s:

    data.find({
      $and: [
        {
          prefix: { $exists: true },
          number: { $exists: true },
          latter: { $exists: true }
        },
        {
          $or: [
            { latter: "latterword" },
            {
              prefix: "prefixChar",
              latter: { $regex: `^${latterword[0]}` }
            },
            { prefix: "prefixChar" }
          ]
        }
      ]
    })
    

    Also, you should sort when using limit; otherwise results will be in a random order every time you execute it.

    Login or Signup to reply.
  2. From the scale of the dataset, I am guessing it is likely to be some datalake collection that might have a high read-write ratio. i.e. read frequently but not updated frequently. If that is the case and your parameter does not change frequently, you may consider precomputing the priority and store each execution’s results in a separate collection such that you can refer the same execution result in future.

    db.collection.aggregate([
      {
        "$set": {
          priority: {
            $let: {
              vars: {
                prefixChar: "A",
                latterword: "XYZ"
              },
              "in": {
                "$switch": {
                  "branches": [
                    {
                      "case": {
                        "$and": [
                          {
                            "$eq": [
                              "$prefix",
                              "$$prefixChar"
                            ]
                          },
                          {
                            "$ne": [
                              {
                                "$ifNull": [
                                  "$number",
                                  null
                                ]
                              },
                              null
                            ]
                          },
                          {
                            "$eq": [
                              "$latter",
                              "$$latterword"
                            ]
                          }
                        ]
                      },
                      "then": 1
                    },
                    {
                      "case": {
                        "$and": [
                          {
                            "$ne": [
                              {
                                "$ifNull": [
                                  "$prefix",
                                  null
                                ]
                              },
                              null
                            ]
                          },
                          {
                            "$ne": [
                              {
                                "$ifNull": [
                                  "$number",
                                  null
                                ]
                              },
                              null
                            ]
                          },
                          {
                            "$eq": [
                              "$latter",
                              "$$latterword"
                            ]
                          }
                        ]
                      },
                      "then": 2
                    },
                    {
                      "case": {
                        "$and": [
                          {
                            "$eq": [
                              "$prefix",
                              "$$prefixChar"
                            ]
                          },
                          {
                            "$ne": [
                              {
                                "$ifNull": [
                                  "$number",
                                  null
                                ]
                              },
                              null
                            ]
                          },
                          {
                            $eq: [
                              0,
                              {
                                "$indexOfCP": [
                                  "$latter",
                                  {
                                    "$substrCP": [
                                      "$$latterword",
                                      0,
                                      2
                                    ]
                                  }
                                ]
                              }
                            ]
                          }
                        ]
                      },
                      "then": 3
                    },
                    {
                      "case": {
                        "$and": [
                          {
                            "$eq": [
                              "$prefix",
                              "$$prefixChar"
                            ]
                          },
                          {
                            "$ne": [
                              {
                                "$ifNull": [
                                  "$number",
                                  null
                                ]
                              },
                              null
                            ]
                          },
                          {
                            $eq: [
                              0,
                              {
                                "$indexOfCP": [
                                  "$latter",
                                  {
                                    "$substrCP": [
                                      "$$latterword",
                                      0,
                                      1
                                    ]
                                  }
                                ]
                              }
                            ]
                          }
                        ]
                      },
                      "then": 4
                    },
                    {
                      "case": {
                        "$and": [
                          {
                            "$eq": [
                              "$prefix",
                              "$$prefixChar"
                            ]
                          },
                          {
                            "$ne": [
                              {
                                "$ifNull": [
                                  "$number",
                                  null
                                ]
                              },
                              null
                            ]
                          },
                          {
                            "$ne": [
                              {
                                "$ifNull": [
                                  "$latter",
                                  null
                                ]
                              },
                              null
                            ]
                          }
                        ]
                      },
                      "then": 5
                    }
                  ],
                  "default": 6
                }
              }
            }
          }
        }
      },
      {
        "$group": {
          "_id": "$priority",
          "docIds": {
            "$push": "$_id"
          }
        }
      },
      {
        "$set": {
          "_id": {
            "executionId": "fill in your preferred id",
            "priority": "$_id",
            "prefixChar": "A",
            "latterword": "XYZ"
          }
        }
      },
      {
        "$merge": {
          "into": "precomputed"
        }
      }
    ])
    

    Mongo Playground


    Afterwards, when you are trying to fetch the result, you can start a $lookup from your precomputed collection. The precomputed collection can be indexed to boost the performance.

    db.precomputed.aggregate([
      {
        "$match": {
          "_id.executionId": "fill in your preferred id"
        }
      },
      {
        "$sort": {
          "_id.priority": 1
        }
      },
      {
        "$unwind": "$docIds"
      },
      {
        "$limit": 12
      },
      {
        "$lookup": {
          "from": "collection",
          "localField": "docIds",
          "foreignField": "_id",
          "as": "rawDocsLookup"
        }
      },
      {
        "$unwind": "$rawDocsLookup"
      },
      {
        "$replaceRoot": {
          "newRoot": {
            "$mergeObjects": [
              "$rawDocsLookup",
              {
                priority: "$_id.priority"
              }
            ]
          }
        }
      }
    ])
    

    Mongo Playground

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search