We have three nested arrays:
principalCredits
with 2 objectscredits
with 2 objects eachawardNominations.edges
with variable totals from 0 to 3
The task is to add a field to the third array of objects awardNominations.edges
based on a lookup from eventsCollection
.
Here’s the data I have (simplified, can copy and paste into MongoDB Compass):
[{
"principalCredits": [
{
"category": {
"id": "director",
"text": "Directors"
},
"totalCredits": 2,
"credits": [
{
"name": {
"id": "nm11813828",
"nameText": {
"text": "Pippa Ehrlich"
},
"awardNominations": {
"total": 2,
"edges": [
{
"node": {
"id": "an1393007",
"isWinner": true,
"award": {
"id": "an1393007",
"year": 2020,
"text": "Green Warsaw Award",
"event": {
"id": "ev0003786",
"text": "Millennium Docs Against Gravity"
},
"category": {
"text": null
}
}
}
},
{
"node": {
"id": "an1428940",
"isWinner": false,
"award": {
"id": "an1428940",
"year": 2021,
"text": "IDA Award",
"event": {
"id": "ev0000351",
"text": "International Documentary Association"
},
"category": {
"text": "Best Writing"
}
}
}
},
]
}
},
"category": {
"id": "director",
"text": "Director"
}
},
{
"name": {
"id": "nm1624755",
"nameText": {
"text": "James Reed"
},
"awardNominations": {
"total": 3,
"edges": [
{
"node": {
"id": "an0694012",
"isWinner": true,
"award": {
"id": "an0694012",
"year": 2015,
"text": "Best of Festival",
"event": {
"id": "ev0001486",
"text": "Jackson Wild Media Awards"
},
"category": {
"text": "Best of Festival"
}
}
}
},
{
"node": {
"id": "an0975779",
"isWinner": true,
"award": {
"id": "an0975779",
"year": 2017,
"text": "RTS West Television Award",
"event": {
"id": "ev0000571",
"text": "Royal Television Society, UK"
},
"category": {
"text": "Documentary"
}
}
}
},
{
"node": {
"id": "an0975781",
"isWinner": true,
"award": {
"id": "an0975781",
"year": 2015,
"text": "Grand Teton Prize",
"event": {
"id": "ev0001356",
"text": "Jackson Hole Film Festival"
},
"category": {
"text": "Best in Festival"
}
}
}
}
]
}
},
"category": {
"id": "director",
"text": "Director"
}
}
]
},
{
"category": {
"id": "writer",
"text": "Writers"
},
"totalCredits": 2,
"credits": [
{
"name": {
"id": "nm11813828",
"nameText": {
"text": "Pippa Ehrlich"
},
"awardNominations": {
"total": 2,
"edges": [
{
"node": {
"id": "an1393007",
"isWinner": true,
"award": {
"id": "an1393007",
"year": 2020,
"text": "Green Warsaw Award",
"event": {
"id": "ev0003786",
"text": "Millennium Docs Against Gravity"
},
"category": {
"text": null
}
}
}
},
{
"node": {
"id": "an1428940",
"isWinner": false,
"award": {
"id": "an1428940",
"year": 2021,
"text": "IDA Award",
"event": {
"id": "ev0000351",
"text": "International Documentary Association"
},
"category": {
"text": "Best Writing"
}
}
}
}
]
}
},
"category": {
"id": "writer",
"text": "Writer"
},
},
{
"name": {
"id": "nm1624755",
"nameText": {
"text": "James Reed"
},
"awardNominations": {
"total": 0,
"edges": []
}
},
"category": {
"id": "writer",
"text": "Writer"
},
}
]
}
]
}]
An example scored award should look like this:
{
"id": "an0975781",
"isWinner": true,
"award": { ... },
"score": 1.5
}
Once all the manipulation is done, the data needs to be in exactly the same shape as it was initially and with no null values. So in the case of the last array awardsNominations.edges
it should be []
as it was, and not { node: { score: null }}
or anything else.
To achieve this I have created an aggregation pipeline:
[
{
'$unwind': {
'path': '$principalCredits',
'preserveNullAndEmptyArrays': true
}
}, {
'$unwind': {
'path': '$principalCredits.credits',
'preserveNullAndEmptyArrays': true
}
}, {
'$unwind': {
'path': '$principalCredits.credits.name.awardNominations.edges',
'preserveNullAndEmptyArrays': true
}
}, {
'$lookup': {
'from': 'eventsCollection',
'localField': 'principalCredits.credits.name.awardNominations.edges.node.award.event.id',
'foreignField': 'id',
'as': 'matchingEvent'
}
}, {
'$unwind': {
'path': '$matchingEvent',
'preserveNullAndEmptyArrays': true
}
}, {
'$addFields': {
'principalCredits.credits.name.awardNominations.edges.node.score': {
'$multiply': [
'$matchingEvent.importance', {
'$cond': {
'if': '$principalCredits.credits.name.awardNominations.edges.node.isWinner',
'then': 1.5,
'else': 1.2
}
}
]
}
}
}
]
The above pipeline assigns the score
to each award. However, the null values are still there and I have absolutely no idea how to group
it back together. I have tried to group with:
{
'$group': {
'_id': '$id',
'titleDoc': {
'$first': '$$ROOT'
},
'allPrincipalCredits': {
'$push': '$principalCredits'
}
}
}
To keep the root and then somehow sort all the records back into shape but could not get back to the orginal object structure.
Any help in putting it all together will be much appriciated!
I’m fairly good with simple aggregations, but this seems to be too much for me currently and would love to learn how to $group
things back properly.
I’ve tried and put together all the knowledge I have so far from different sources and similar answers but can’t seem to get it to work.
Lookup collection eventsCollection
contains objects like this:
{
"_id": { "$oid": "62c57125d6943d92f83f6fff" },
"id": "ev0030197",
"text": "#AmLatino Film Festival",
"importance": 1
}
2
Answers
The deep buried keys (
...award.event.id
) in arrays confounds an easy approach without 1) messing up the structure as the OP has noted 2) incurring potentially very expensive multiple$unwind
calls.Recommendation: Two pass approach. Get the necessary
importance
values for theprincipalCredits
objects in question, then go back and manually iterate over the collection, diving into the structure and applying the logicscore = importance * isWinner? 1.2 : 1.5
PASS 1: Get the ev data
PASS 2: Iterate main collection
Left as exercise to reader.
Note that if
$match
on the initialprincipalCredits
collection (i.e. before the fancy$project/$reduce
) to significantly reduce the lookup set intoevents
then this whole thing is unnecessary. Simply slurp all events into
evdict
with a quickfind
and proceed to pass 2.There is potentially a very cool solution that can do this in one pass
UPDATED
See Tom’s answer below.
Note to MongoDB 5.0 users: The new
$getField
function allows you to pluck out fields by name instead of having to use the standard trick of using dot notation in the$in
clause to access the field. This might be clearer to some:So the "rule" in restoring to original structure is that for each
$unwind
you did to "deconstruct" the document you now have to do a$group
to restore it.As you can imagine in such a pipeline this could be VERY cumbersome. but definitely doable.
However let me propose a different approach that is still very messy but much easier compared to the alternative, additionally it is more efficient from a performance perspective.
(just minor sidenot the reason your score is still
null
is because you have a syntax error in your$multiply
function)Anyways, The idea is to first gather all the unique event ids that exist in the in nested documents.
Then execute one lookup to fetch all the relevant events.
And finally adding the
score
field using$map
and$mergeDocuments
instead of$unwind
ing and$group
ing, like so:Mongo Playground
Mongo Playground
I will just mention that you can make this much much much cleaner by involving some code while keeping the same approach suggested. first getting unique
eventid
withdistinct
. then fetching the matching importance for each event. Finally execute a single query usingarrayFilters
you can construct with this information.null
or missing values. So if an array is missing an error will be thrown as$map
expects input to be a valid array.This can easily be solved by just wrapping each of these expressions with
$ifNull
, like so:This will also replace
null
values with an empty[]