skip to Main Content

I have highly unstructured dynamic data where I have a field that is a multidimensional string array. The exact size of the outer or inner arrays is not known beforehand and can vary. I would like to find all documents where any/all of the nested array elements evaluate on a condition I have. So for the example document below I only want to match it if ANY of the arrays has elements that are ALL have a string length greater than 3 – which would be true in this case.

{
  "_id": "00000000-0000-0007-0000-000000000001",
  "stringArrays": [
    [
      "Hello",
      "World",
      "!"
    ],
    [
      "Fred",
      "Todd"
    ]
  ]
}

The first array contains an element that is less than 3 the "!", but the second array with names matches both values of string length > 3. So overall the condition is met.

So far I have tried to find info about how to do this but all answers were either using fixed array indices which I can’t or hard regex conditions which works better but will not work in all my cases. Sometimes I need to also query element.Length $IN [1,3,5,..] so exact string length values and not just $LT or $EQ or $GT.

My only lead is maybe using https://www.mongodb.com/docs/manual/reference/operator/aggregation/strLenCP/#example somehow to project additional string length data to query later, but I would not know how to create a nested stringArraysLengths that maybe have the same structure but instead of values it has the string lengths I can query instead.

Maybe there are better ways to do this. I am relatively new to Mongodb and would appreciate some more experience users giving me a few hints to find the right way to do this. Thank you!

2

Answers


  1. Here’s one way you could do it with nested "$reduce" operations.

    db.collection.aggregate([
      {
        "$match": {
          "$expr": {
            "$reduce": {
              "input": "$stringArrays",
              "initialValue": false,
              "in": {
                "$or": [
                  "$$value",
                  {
                    "$reduce": {
                      "input": "$$this",
                      "initialValue": {
                        "$gt": [{"$size": "$$this"}, 0]
                      },
                      "in": {
                        "$and": [
                          "$$value",
                          {"$gt": [{"$strLenCP": "$$this"}, 3]}
                        ]
                      }
                    }
                  }
                ]
              }
            }
          }
        }
      }
    ])
    

    Try it on mongoplayground.net.

    Login or Signup to reply.
  2. This would be my approach:

    db.collection.aggregate([
       {
          $set: {
             lengthArray: {
                $map: {
                   input: "$stringArrays",
                   as: "outer",
                   in: {
                      $min: { // get the shortest length
                         $map: {
                            input: "$$outer",
                            in: { $strLenCP: "$$this" } // get length of each sub-strings
                         }
                      }
                   }
                }
             }
          }
       },
       {
          $set: {
             overall: { $max: "$lengthArray" }
          }
       },
       { $match: { overall: { $gt: 3 } } },
       //{ $unset: ["lengthArray", "overall"] }
    ])
    

    Mongo Playground

    Or use $filter to remove all short strings and check if result is not empty.

    db.collection.aggregate([
       {
          $set: {
             lengthArray: {
                $map: {
                   input: "$stringArrays",
                   as: "outer",
                   in: {
                      $filter: {
                         input: "$$outer",
                         cond: { $gt: [{ $strLenCP: "$$this" }, 3] } // remove all short strings
                      }
                   }
                }
             }
          }
       },
       {
          $set: {
             lengthArray: {
                $filter: {
                   input: "$lengthArray",
                   cond: { $gt: [{ $size: "$$this" }, 0] } // remove all empty arrays
                }
             }
          }
       },
       { $match: { lengthArray: { $ne: [] } } },
       //{ $unset: "lengthArray" }
    ])
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search