skip to Main Content

I have approx. 1.1 million records in Mongo and written the below query to get the filtered data. When I am trying to run the query in mongo compass it is giving exceeded time out error. I am new to Mongo and doesn’t have much idea on how can I optimize it.

[
  {
    $match: {
      offerCheckedDate: {
        $gte: ISODate("2022-11-14T00:00:00.000Z"),
        $lt: ISODate("2022-12-20T00:00:00.000Z"),
      },
      offerAvailable: "YES",
  channelId: {
   $in: [
    1000001,
    1000000
   ]
  }
    },
  },
  {
    $group: {
      _id: {
        mobile: "$mobile",
      },
      mobile: {
        $addToSet: "$mobile",
      },
    },
  },
  {
    $unwind: "$mobile",
  },
  {
    $lookup: {
      from: "PA_DATA_REPORTING",
      localField: "mobile",
      foreignField: "mobile",
      as: "result",
    },
  },
  {
    $unwind: "$result",
  },
  {
    $replaceRoot: {
      newRoot: "$result",
    },
  },
  {
    $match: {
      customAppliedDate: {
        $gte: ISODate("2022-11-14T00:00:00.000Z"),
        $lt: ISODate("2022-11-20T00:00:00.000Z"),
      },
    },
  },
  {
    $project: {
      equal: {
        $eq: ["$financierId", "$appliedFinancierId"],
      },
      doc: "$$ROOT",
    },
  },
  {
    $match: {
      equal: true,
    },
  },
  {
    $group: {
      _id: {
        financierId: "$doc.financierId",
      },
      mobile: {
        $addToSet: "$doc.mobile",
      },
    },
  },
  {
    $unwind: "$mobile",
  },
  {
    $group: {
      _id: "$_id.financierId",
      mobileCount: {
        $sum: 1,
      },
    },
  },
]

I tried adding the pipeline in $lookUp but even that didn’t help. Something like below:

{
  from: "PA_DATA_REPORTING",
  localField: "mobile",
  foreignField: "mobile",
  pipeline: [
    {
      $match: {
        customAppliedDate: {
          $gte: ISODate("2022-11-14T00:00:00.000Z"),
          $lt: ISODate("2022-11-18T00:00:00.000Z"),
        },
      },
    },
  ],
  as: "result",
}

Below is the sample document I am iterating through.


{
  "_id": {
    "$binary": {
      "base64": "fURSsmgrcSh/xWN/ENWwiA==",
      "subType": "03"
    }
  },
  "enquiryId": "e4f22813-66f9-4a09-9e92-66bacd791943",
  "mobile": "7945536728",
  "financierId": {
    "$numberLong": "280005"
  },
  "channelId": {
    "$numberLong": "1000000"
  },
  "offerAvailable": "NO",
  "offerCheckedDate": {
    "$date": {
      "$numberLong": "1640975400000"
    }
  },
  "financierName": "Cholamandalam Finance",
  "bankOfferAmount": {
    "$numberLong": "10000"
  },
  "appliedFinancierId": {
    "$numberLong": "280004"
  },
  "appliedFinancierName": "Cholamandalam Finance",
  "paAppliedDate": {
    "$date": {
      "$numberLong": "1640975400000"
    }
  },
  "paDisbursedDate": {
    "$date": {
      "$numberLong": "1640975400000"
    }
  },
  "paSanctionedDate": {
    "$date": {
      "$numberLong": "1640975400000"
    }
  },
  "customAppliedDate": {
    "$date": {
      "$numberLong": "1640975400000"
    }
  },
  "customSanctionedDate": {
    "$date": {
      "$numberLong": "1640975400000"
    }
  },
  "landedOnPAOfferPage": "NO",
  "_class": "com.maruti.fmp.reporting.domain.document.PAOfferDocument"
}

Is there any way I can optimize the query and resolve time out error.

2

Answers


  1. First of all, due to there are no information about data in database (about inner objects (there are a lot of unwind operations), what type indexes exist(it lead to large query execution time), the problem could be in different places. I can just give a few advices how to optimize queries.
    There is a flow how to solve performance problems:

    1. There is a mongo db tool to analyze queries. Try to use explain to check the critical part of your query.
    • Collection scans means all documents in a collection must be read.
    • Index scans limit the number of documents that must be inspected.
      Take a look here how to read explain with example.
    1. Add indexes to your database for critical parts (do not use index for all fields – it is necessary index only data that you need to look up) More about query optimization
      Looks like one of the matching fields consume a lot of time in query processing. (check offerAvailable and channelId)
    Login or Signup to reply.
  2. I have no idea what you try to achieve, it is a bit difficult with only one sample document. Anyway, this one returns the same result as your query:

    db.collection.aggregate([
       {
          $match: {
             offerCheckedDate: {
                $gte: ISODate("2022-11-14T00:00:00.000Z"),
                $lt: ISODate("2022-12-20T00:00:00.000Z"),
             },
             offerAvailable: "YES",
             channelId: { $in: [1000001, 1000000] },
          }
       },
       { $match: { $expr: { $eq: ["$financierId", "$appliedFinancierId"] } } },
       {
          $group: {
             _id: "$financierId",
             mobiles: { $addToSet: "$mobile" },
          }
       },
       {
          $project: {
             mobileCount: { $size: "$mobiles" }
          }
       }
    ])
    

    Most likely it is not exactly what you are looking for, but in general it seems you do a lot of redundant/useless stuff in your aggregation pipeline.

    Maybe $setWindowFields is also a useful function for your use-case.

    Mongo Playground

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search