MongoDB Aggregation: group documents into equal-sized windows / chunks

Andrii
January 19, 2023
291 views
0 votes
2 Answers

I’m performing MongoDB aggregation to create a new collection from an existing one, and I’m struggling to find a way to group elements by count, rather than by values.

I want to achieve something like this:

data:

[
    {"_id": "my_id_0"},
    {"_id": "my_id_1"},
    {"_id": "another_id"},
    {"_id": "another_id_123"},
    {"_id": "_id"},
    {"_id": "document_id"},
    {"_id": "document_id_1"},
    {"_id": "document_id_2"},
    {"_id": "document_id_3"},
    {"_id": "document_id_4"},
]

query

db.coll.aggregate([
    {
        $someNonExistingStage: {
            output: {
                chunk: {"$push": "$_id"}
            },
            n: 3
        }
    }
])

result:

[
    {"chunk": ["my_id_0", "my_id_1", "another_id"]},
    {"chunk": ["another_id_123", "_id", "document_id"]},
    {"chunk": ["document_id_1", "document_id_2", "document_id_3"]},
    {"chunk": ["document_id_4"]},
]

The real length of chunks I want to have is about more or less 1024

I think maybe it can be achieved using bucketAuto or setWindowFields, but it looks like I should enumerate all the documents first, which is not clear.

Thanks in advance.

Tags: aggregation-framework mongodb

Answers

Chosen as BEST ANSWER

https://mongoplayground.net/p/zNnIBweQBDf

db.collection.aggregate(
    [
        {
            "$setWindowFields": {
                "sortBy": {"_id": -1},
                "output": {"doc_idx": {"$documentNumber": {}}},
            }
        },
        {
            "$addFields": {
                "chunk_idx": {
                    // replace 3 with your desired chunk size
                    "$floor": {"$divide": [{"$subtract": ["$doc_idx", 1]}, 3]}
                }
            }
        },
        {"$group": {"_id": "$chunk_idx", "chunk": {"$push": "$_id"}}},
    ]
)

(Edit)

- BuzzMoschetti
- January 19, 2023 at 2:55 pm
- 0 votes
0
The database isn’t really doing anything for you in this scenario. We are neither filtering nor grouping documents to reduce the amount of material pulled from the collection and transmitted to the client, and we are not exploiting indexes. We might as well just run a loop on the client side:
```
function vendChunk(cursor, size) {
    var chunk = [];
    for(var i = 0; i < size; i++) {
        if(!cursor.hasNext()) {
            break;
        }
        chunk.push(cursor.next());
    }
    return chunk;
}


c = db.foo.find(); // or find(predicate) if desired...

while(1) {
    var chunk = vendChunk(c, 4);
    if(chunk.length == 0) {
        break;
    }
    print("chunk: ", chunk);
}
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.