skip to Main Content

I have 4 conditions which I want to use to fetch my documents:

  1. data MUST be between from_dateto_date
  2. imageKey MUST be included in IDS
  3. EITHER (at least 1 of) "updatedCoordinates.status" should be in ["ORIGINALLY_UNDETECTED","CHANGED"]
  4. OR the length of updatedCoordinates should NOT be equal to predictedCoordinates

I have created a query for the given 2. Need to know if my 3rd satisfies the at least 1 condition and how to correctly implement the 4th?

from datetime import datetime

from_date = datetime(2022, 9, 25, 0, 1, 1, 0)
to_date = datetime(2022, 9, 28, 23, 59, 59, 0)

QUERY = {"$and":[
    {'imageKey':{"$in":IDS}},
    {"createdAt": {"$gte": from_date, "$lt": to_date}},
    {
        "$or":[
            {"updatedCoordinates.status": {"$in":["ORIGINALLY_UNDETECTED","CHANGED"]}},
            {"$ne":[{"$size":"updatedCoordinates"},{"$size":"predictedCoordinates"}]}
        ]
    }]}

my_collection.count_documents(QUERY)

I can use the Python function for 3rd and 4th after fetching all the data like:

def is_valid(box):
    if len(box["updatedCoordinates"]) != len(box["predictedCoordinates"]): return True
    
    for up in box["updatedCoordinates"]:
        if up["status"] in ["ORIGINALLY_UNDETECTED","CHANGED"]: return True
    return False

But it’ll be lots of overhead.

Here is what my data looks like:

{
 'createdAt': datetime.datetime(2012, 8, 12, 10, 1, 47, 693000),
 'imageKey': '12345678901234567890123456789012',
 'updatedCoordinates': [{
   'xmin': 102.6942138671875,
   'xmax': 1020.8214111328125,
   'ymin': 748.9141235351562,
   'ymax': 962.6471557617188,
   'status': 'ORIGINALLY_UNDETECTED',
   },
  {
   'xmin': 84.90644836425781,
   'xmax': 1061.1834716796875,
   'ymin': 1089.3341064453125,
   'ymax': 1381.4041748046875,
   'status': 'CHANGED',
   },
  {
   'xmin': 108.43966674804688,
   'xmax': 1022.5031127929688,
   'ymin': 307.7042236328125,
   'ymax': 546.1324462890625,
   'status': 'UNCHANGED',
   },
  ],
 'predictedCoordinates': [{
   'xmin': 139.69284057617188,
   'xmax': 1008.458251953125,
   'ymin': 733.7783203125,
   'ymax': 951.54638671875,
   'status': 'UNCHANGED',
   },
  {
   'xmin': 155.5401153564453,
   'xmax': 989.7386474609375,
   'ymin': 1175.1036376953125,
   'ymax': 1389.05322265625,
   'status': 'UNCHANGED',
   },
  ]}

2

Answers


  1. Since your 4th condition is matching the document to itself, you need an aggregation pipeline. For example:

    db.collection.aggregate([
      {$match: {
          $expr: {
            $and: [
              {$in: ["$imageKey", IDS]},
              {$gte: ["$createdAt", from_date]},
              {$lt: ["$createdAt", to_date]},
              {$or: [
                  {$gt: [
                      { $size: {
                        $setIntersection: [
                          "$updatedCoordinates.status",
                          ["ORIGINALLY_UNDETECTED", "CHANGED"]
                        ]
                      }},
                      0
                    ]
                  },
                  {$ne: [{$size: "$updatedCoordinates"}, {$size: "$predictedCoordinates"}]}
              ]}
            ]
          }
        }
      }
    ])
    

    See how it works on the playground example

    Login or Signup to reply.
  2. For Query 4, use the $expr operator (new in MongoDb v5) to build an aggregation to compare the size of the updatedCoordinates and predictedCoordinates fields.

    
    QUERY = {"$and":[
        {'imageKey':{"$in":IDS}},
        {"createdAt": {"$gte": from_date, "$lt": to_date}},
        {
            "$or":[
                {"updatedCoordinates.status": {"$in":["ORIGINALLY_UNDETECTED","CHANGED"]}},
                {"$expr": {$ne: [{"$size":"$updatedCoordinates"},{"$size":"$predictedCoordinates"}]}}
            ]
        }]}
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search