skip to Main Content

I have a collection of phrases with the following structure:

  • phrase – string
  • phraseLength (the length of the phrase string)
  • author – string

I would like to group phrases by author. And for each author, the maximum length of all that author’s phrases shouldn’t exceed 200 characters.

In other words, I am trying to figure out how to create an aggregate query that will:

  • loop through each phrase
  • If current author’s total running phrases length is less than 200 characters, then push it to the author’s phrases array

So for example with the following documents (I added incorrect phrase length for each document for ease of understanding):

[
    {
        phrase: 'This is phrase 1 of author 1',
        phraseLength: 50,
        author: 'Author 1',
    },
    {
        phrase: 'This is phrase 1 of author 1',
        phraseLength: 150,
        author: 'Author 1',
    },
    {
        phrase: 'This is phrase 1 of author 1',
        phraseLength: 10,
        author: 'Author 1',
    },
    {
        phrase: 'This is phrase 1 of author 2',
        phraseLength: 20,
        author: 'Author 2',
    },
    {
        phrase: 'This is phrase 2  of author 2',
        phraseLength: 180,
        author: 'Author 2',
    },
    {
        phrase: 'This is phrase 3  of author 2',
        phraseLength: 50,
        author: 'Author 2',
    },
]

This is the desired output:

[
    {
        _id: 'Author 1',
        phrases: [
            {
                phrase: 'This is phrase 1 of author 1',
                phraseLength: 50,
                author: 'Author 1',
            },
            {
                phrase: 'This is phrase 1 of author 1',
                phraseLength: 150,
                author: 'Author 1',
            },
        ]
    },
    {
        _id: 'Author 2',
        phrases: [
            {
                phrase: 'This is phrase 1 of author 2',
                phraseLength: 20,
                author: 'Author 2',
            },
            {
                phrase: 'This is phrase 2  of author 2',
                phraseLength: 180,
                author: 'Author 2',
            },
        ]
    },
]

I know how to group all phrases by author like this:

db.phrases.aggregate([
    {
        $group: {
            _id: '$author',
            phrases: {
                $push: { phrase: '$phrase', phraseLength: '$phraseLength', author: '$author' }
            }
        }
    },
])

And I have tried the following query to accomplish the desired outcome but it didn’t work.

db.phrases.aggregate([
    {
        $group: {
            _id: "$author",
            runningTotalPhraseLength: { $sum: '$phraseLength' },
            $cond: {
                if: {
                    $lte: [{ $size: '$runningTotalPhraseLength' }, 200]
                },
                then: {
                    phrases: {
                        $push: { phrase: '$phrase', phraseLength: '$phraseLength', author: '$author' }
                    }
                },
                else: {},
            },
        }
    },
])

Any help is very much appreciated! 🙂

2

Answers


  1. I think it is not possible to define the conditional accumulate push logic in the $group stage.

    But you need another $set stage with $reduce operator.

    1. $group

    2. $set – Create a temp field.

      2.1. $reduce – Iterate each element in phrases array and return a new object.

      2.1.1. inputphrases array.

      2.1.2. initialValue – Initialize an object with length and phrases fields.

      2.1.3. in

      2.1.3.1. $cond – Compare if the accumulated value.length plus the length for current iterate object is lesser or equal to 200.

      If true, then set the object with length: $$value.length + $$this.length and phrases with add the iterated phrase object to the accumulated $$value.phrases.

      Else, remain the same value for the accumulated $$value.

    3. $set – Replace phrases array with temp.phrases.

    4. $unset – Remove temp field.

    db.phrases.aggregate([
      {
        $group: {
          _id: "$author",
          phrases: {
            $push: {
              phrase: "$phrase",
              phraseLength: "$phraseLength",
              author: "$author"
            }
          }
        }
      },
      {
        $set: {
          temp: {
            $reduce: {
              input: "$phrases",
              initialValue: {
                length: 0,
                phrases: []
              },
              in: {
                $cond: {
                  if: {
                    $lte: [
                      {
                        $sum: [
                          "$$this.phraseLength",
                          "$$value.length"
                        ]
                      },
                      200
                    ]
                  },
                  then: {
                    length: {
                      $sum: [
                        "$$this.phraseLength",
                        "$$value.length"
                      ]
                    },
                    phrases: {
                      $concatArrays: [
                        [
                          "$$this"
                        ],
                        "$$value.phrases"
                      ]
                    }
                  },
                  else: "$$value"
                }
              }
            }
          }
        }
      },
      {
        $set: {
          phrases: "$temp.phrases"
        }
      },
      {
        $unset: "temp"
      }
    ])
    

    Demo @ Mongo Playground

    Login or Signup to reply.
  2. Using $setWindowFields to compute cumulative length and then conditional grouping,

    db.phrases.aggregate([
      {
        $setWindowFields: {
          partitionBy: "$author",
          sortBy: {null: 1},
          output: {
            "cumuLength": {
              $sum: "$phraseLength",
              window: {
                documents: ["unbounded","current"]
              }
            }
          }
        }
      },
      {
        $group: {
          _id: "$author",
          phrases: {
            $push: {
              $cond: [
                {$lte: ["$cumuLength",200]},
                {
                  phrase: "$phrase",
                  phraseLength: "$phraseLength",
                  author: "$author"
                },
                "$$REMOVE"
              ]
            }
          }
        }
      }
    ])
    

    Demo

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search