skip to Main Content

I have a document structure like this:

[
    {
        "country": "UK",
        "shops": [
            {"city": "London", "fruits": ["banana", "apple"]},
            {"city": "Birmingham", "fruits": ["banana", "pineapple"]},
        ],
    },
    {
        "country": "DE",
        "shops": [
            {"city": "Munich", "fruits": ["banana", "strawberry"]},
            {"city": "Berlin", "fruits": ["kiwi", "pineapple"]},
        ],
    },
]

In my python script, i have a dict mapping each fruit to a category value:

categories = {
    1: ["apple"],
    2: ["banana", "kiwi"],
    3: ["pineapple", "strawberry"]
}

DESIRED OUTPUT:

Now i would like to get, using the mongo aggregation framework, for each document the max_category projected from the max value found in the shops sub-documents based on the category mapping.

[
    {
        "country": "UK",
        "shops": [
            {"city": "London", "fruits": ["banana", "apple"]},
            {"city": "Birmingham", "fruits": ["banana", "pineapple"]},
        ],
        "max_category": 3
    },
    {
        "country": "DE",
        "shops": [
            {"city": "Munich", "fruits": ["banana", "apple"]},
            {"city": "Berlin", "fruits": ["kiwi", "apple"]},
        ],
        "max_category": 2
    },
]

Thanks for your help!

2

Answers


  1. You category dict will not fit in mongo as mongo requires objects to have keys in string types. You can wrangle the dict to below form for easier processing:

    [
      {
        category: 1,
        fruits: [
          "apple"
        ]
      },
      {
        category: 2,
        fruits: [
          "banana",
          "kiwi"
        ]
      },
      {
        category: 3,
        fruits: [
          "pineapple",
          "strawberry"
        ]
      }
    ]
    

    In aggregation pipeline, iterate through the wrangled category array by $reduce to conditionally update the accumulator to get the max matched category.

    db.collection.aggregate([
      {
        "$unwind": "$shops"
      },
      {
        "$set": {
          "max_category": {
            "$reduce": {
              "input": [
                {
                  category: 1,
                  fruits: [
                    "apple"
                  ]
                },
                {
                  category: 2,
                  fruits: [
                    "banana",
                    "kiwi"
                  ]
                },
                {
                  category: 3,
                  fruits: [
                    "pineapple",
                    "strawberry"
                  ]
                }
              ],
              "initialValue": null,
              "in": {
                "$cond": {
                  "if": {
                    $and: [
                      {
                        $gt: [
                          "$$this.category",
                          "$$value"
                        ]
                      },
                      {
                        $gt: [
                          {
                            $size: {
                              "$setIntersection": [
                                "$$this.fruits",
                                "$shops.fruits"
                              ]
                            }
                          },
                          0
                        ]
                      }
                    ]
                  },
                  "then": "$$this.category",
                  "else": "$$value"
                }
              }
            }
          }
        }
      },
      {
        "$group": {
          "_id": "$_id",
          "country": {
            $first: "$country"
          },
          "max_category": {
            $max: "$max_category"
          },
          "shops": {
            "$push": "$shops"
          }
        }
      }
    ])
    

    Mongo Playground

    Login or Signup to reply.
  2. Another option is to calculate it without unwinding and grouping again:
    Just with $reduce, $map and $filter:

    db.collection.aggregate([
      {$set: {max_category: {
            $max: {$reduce: {
                input: "$shops",
                initialValue: 0,
                in: {$max: [
                    "$$value",
                    {$map: {
                        input: "$$this.fruits",
                        as: "f",
                        in: {$toInt: {$getField: {
                              field: "k",
                              input: {$first: {$filter: {
                                    input: {$objectToArray: categories},
                                    as: "val",
                                    cond: {$in: ["$$f", "$$val.v"]}}
                              }}
                        }}}
                  }}
              ]}
            }}
      }}}
    ])
    

    See how it works on the playground example

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search