skip to Main Content

Given the following dataset:

category date amount
fruit 2023-01-01 1
fruit 2023-01-02 2
meat 2023-01-03 3
nuts 2023-01-04 4
nuts 2023-01-05 5
nuts 2023-01-06 6

Is it possible to do a multi-group aggregation in Mongo JPA that yields:

  • One row per category
  • Count of rows in that category
  • Amount on max(date)
category count last_amount
fruit 2 2
meat 1 3
nuts 3 6

This should be possible in a relational database with some SQL grouping and possibly sub-queries but seems incredibly difficult to achieve in Mongo. I could get get the row count in one aggregation but seems like I would need multiple operations to achieve what I need.

2

Answers


  1. Chosen as BEST ANSWER

    This was genuinely surprisingly difficult to achieve. Charchit pointed me in the right direction but to get to the exact response I wanted required so much more effort.

    Explanation of stages required:

    • $group: group the collection by category and obtain the $sum and the $max($date) of each category
    • $lookup: join the collection with itself to obtain the amount at the maxDate
    • the $lookup pipeline is required because a multi-field match needs to be performed ensuring the match is for the maxDate only for that given category
    • the $lookup let is required because $lookup pipeline somehow cannot reference a previous stage field directly. note the double $ required to reference let variables like $$maxDate
    • $project is used because we are only interested in the amount, so the rest of the $lookup document needs to be discarded
    • $replaceRoot + $mergeObjects + $arrayElemAt is used to merge the first entry in the $lookup result up into the root document
    • the final $project removes the (now) unneeded maxDate and latestAmount working fields

    Here's the resulting query that meets the exact specification in the question:

    db.collection.aggregate([
      {
        $group: {
          _id: "$category",
          count: { $sum: 1 },
          maxDate: { $max: "$date" }
        }
      },
      {
        $lookup: {
          from: "collection",
          let: {
            category: "$_id",
            maxDate: "$maxDate"
          },
          pipeline: [
            { $match: {
                $expr: {
                  $and: [
                    { $eq: [ "$category", "$$category" ] },
                    { $eq: [ "$date", "$$maxDate" ] }
                  ]
                }
            }},
            { $project: { _id: 0, amount: 1 } }
          ],
          as: "latestAmount"
        }
      },
      {
        $replaceRoot: {
          newRoot: {
            $mergeObjects: [
              { $arrayElemAt: [ "$latestAmount", 0 ] }, "$$ROOT"
            ]
          }
        }
      },
      { $project: { maxDate: 0, latestAmount: 0 } }
    ])
    

    Playground


  2. Try this:

    db.collection.aggregate([
      {
        "$addFields": {
          "dateValue": {
            "$toDate": "$date"
          }
        }
      },
      {
        "$sort": {
          dateValue: 1
        }
      },
      {
        "$group": {
          "_id": "$category",
          "count": {
            "$sum": 1
          },
          "lastAmount": {
            "$last": "$amount"
          }
        }
      }
    ])
    

    In this query, we first convert the string date into a date object. This step is not required if your dates are already stored as dates. Next, we sort the documents, by date in ascending order. Finally, we group them by category, calculating the count and last_amount.

    Playground link.

    You can use this if you don’t want to use sorting.

    db.collection.aggregate([
      {
        "$group": {
          "_id": "$category",
          "count": {
            "$sum": 1
          },
          "lastAmount": {
            "$max": "$amount"
          }
        }
      }
    ])
    

    Playground.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search