Preserve all groups generate by first $group when using the second $group in MongoDB aggregation

Auresol
July 31, 2024
213 views
0 votes
2 Answers

I have one collection that the structure of the record look like this,

{
  _id: ObjectId('66a544dc3a8dff9aaaf65343'),
  user_type: "USER",
  status: CREATE
}

The explanation of each field:

_id : ID generate by MongoDB
user_type : the record’s user type . Can only be USER, ADMIN, DEV
status : the status. Can only be CREATE, ACTIVE, DELETE

I want to run the aggregate to get this output

{
  {
    _id: "ALL",
    stat: {
      "CREATE" : 2,
      "ACTIVE" : 5,
      "DELETE" : 4
    }
    "total": 11
  {
    _id: "USER",
    stat: {
      "CREATE" : 1,
      "ACTIVE" : 2,
      "DELETE" : 0
    },
    "total": 3
  },
  {
    _id: "ADMIN",
    stat: {
      "CREATE" : 0,
      "ACTIVE" : 3,
      "DELETE" : 0
    },
    "total": 3
  },
  {
    _id: "DEV",
    stat: {
      "CREATE" : 1,
      "ACTIVE" : 0,
      "DELETE" : 4
    },
    "total": 5
  }
}

I want to count each occurrence of status for all user_type, compute total number of all status, and create a TOTAL document to count the entire collection

I can easily generate all documents except _id: "ALL" document easily using $group with $project. The aggregate command look like this,

db.getCollection('accounts').aggregate(
  [
    {
      $group: {
        _id: '$user_type',
        CREATE: {
          $sum: {
            $cond: [
              {
                $eq: ['$$ROOT.status', 'CREATE']
              },
              1,
              0
            ]
          }
        },
        ACTIVE: {
          $sum: {
            $cond: [
              {
                $eq: ['$$ROOT.status', 'ACTIVE']
              },
              1,
              0
            ]
          }
        },
        DELETE: {
          $sum: {
            $cond: [
              {
                $eq: ['$$ROOT.status', 'DELETE']
              },
              1,
              0
            ]
          }
        },
      }
    },
    {
      $project: {
        stat: {
          CREATE: '$CREATE',
          ACTIVE: '$ACTIVE',
          DELETE: '$DELETE',
        },
        total: {
          $sum: [
            '$CREATE',
            '$ACTIVE',
            '$DELETE',
          ]
        }
      }
    }
  ],
);

The problem is the second $group command with _id: null

The problem arise when I want to generate another document. I can put another $group command (using _id: null), but it will delete all others record (the only thing left will be the record which _id: null)

Is there a way to preserve the previous records in pipeline? or is there an alternative solution which is better that this? any comment is welcome.

Thank you very much!

Answers

I can think of 2 ways to do it.

use $facet to perform the grouping sum twice. This might involve more data wrangling at later stages, but this should be the most performant and the canonical way to do what you expect.

db.accounts.aggregate([
  {
    "$facet": {
      "accounts": [
        {
          "$group": {
            "_id": "$user_type",
            CREATE: {
              $sum: {
                $cond: [
                  {
                    $eq: [
                      "$status",
                      "CREATE"
                    ]
                  },
                  1,
                  0
                ]
              }
            },
            ACTIVE: {
              $sum: {
                $cond: [
                  {
                    $eq: [
                      "$status",
                      "ACTIVE"
                    ]
                  },
                  1,
                  0
                ]
              }
            },
            DELETE: {
              $sum: {
                $cond: [
                  {
                    $eq: [
                      "$status",
                      "DELETE"
                    ]
                  },
                  1,
                  0
                ]
              }
            }
          }
        }
      ],
      "all": [
        {
          "$group": {
            "_id": "ALL",
            CREATE: {
              $sum: {
                $cond: [
                  {
                    $eq: [
                      "$status",
                      "CREATE"
                    ]
                  },
                  1,
                  0
                ]
              }
            },
            ACTIVE: {
              $sum: {
                $cond: [
                  {
                    $eq: [
                      "$status",
                      "ACTIVE"
                    ]
                  },
                  1,
                  0
                ]
              }
            },
            DELETE: {
              $sum: {
                $cond: [
                  {
                    $eq: [
                      "$status",
                      "DELETE"
                    ]
                  },
                  1,
                  0
                ]
              }
            }
          }
        }
      ]
    }
  },
  {
    "$project": {
      "docs": {
        "$concatArrays": [
          "$accounts",
          "$all"
        ]
      }
    }
  },
  {
    "$unwind": "$docs"
  },
  {
    "$replaceRoot": {
      "newRoot": "$docs"
    }
  },
  {
    $project: {
      stat: {
        CREATE: "$CREATE",
        ACTIVE: "$ACTIVE",
        DELETE: "$DELETE"
      },
      total: {
        $sum: [
          "$CREATE",
          "$ACTIVE",
          "$DELETE"
        ]
      }
    }
  }
])

Mongo Playground

use $$unionWith after your current $group stage to perform the grouping sum again.

db.accounts.aggregate([
  {
    "$group": {
      "_id": "$user_type",
      CREATE: {
        $sum: {
          $cond: [
            {
              $eq: [
                "$status",
                "CREATE"
              ]
            },
            1,
            0
          ]
        }
      },
      ACTIVE: {
        $sum: {
          $cond: [
            {
              $eq: [
                "$status",
                "ACTIVE"
              ]
            },
            1,
            0
          ]
        }
      },
      DELETE: {
        $sum: {
          $cond: [
            {
              $eq: [
                "$status",
                "DELETE"
              ]
            },
            1,
            0
          ]
        }
      }
    }
  },
  {
    "$unionWith": {
      "coll": "accounts",
      "pipeline": [
        {
          "$group": {
            "_id": "ALL",
            CREATE: {
              $sum: {
                $cond: [
                  {
                    $eq: [
                      "$status",
                      "CREATE"
                    ]
                  },
                  1,
                  0
                ]
              }
            },
            ACTIVE: {
              $sum: {
                $cond: [
                  {
                    $eq: [
                      "$status",
                      "ACTIVE"
                    ]
                  },
                  1,
                  0
                ]
              }
            },
            DELETE: {
              $sum: {
                $cond: [
                  {
                    $eq: [
                      "$status",
                      "DELETE"
                    ]
                  },
                  1,
                  0
                ]
              }
            }
          }
        }
      ]
    }
  },
  {
    $project: {
      stat: {
        CREATE: "$CREATE",
        ACTIVE: "$ACTIVE",
        DELETE: "$DELETE"
      },
      total: {
        $sum: [
          "$CREATE",
          "$ACTIVE",
          "$DELETE"
        ]
      }
    }
  }
])

Mongo Playground

Strictly speaking, both of them are just workarounds and do not actually address the need to reuse previously computed results. Although not likely, they might incur performance issues since they are grouping twice. If the data does not change frequently or you have more tolerance for stale data, you may consider materializing the computed result and perform further grouping on it.

- JohanSebastianGonzalez
- July 31, 2024 at 10:18 pm
- 0 votes
0
Add the following pipelines to the aggregation.
```
  {
    $group: {
      _id: "ALL",

      CREATE: { $sum: "$stat.CREATE" },
      ACTIVE: { $sum: "$stat.ACTIVE" },
      DELETE: { $sum: "$stat.DELETE" },
      total: { $sum: "$total" },
      groups: { $push: "$$ROOT" },
    },
  },
  {
    $project: {
      stat: {
        CREATE: "$CREATE",
        ACTIVE: "$ACTIVE",
        DELETE: "$DELETE",
      },
      total: 1,
      groups: 1,
    },
  },
```
In the groups field, you can find the document by $user_type.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.