I’m trying to get a list of current holders at specific times from a collection. My collection looks like this:
[
{
"time": 1,
"holdings": [
{ "owner": "A", "tokens": 2 },
{ "owner": "B", "tokens": 1 }
]
},
{
"time": 2,
"holdings": [
{ "owner": "B", "tokens": 2 }
]
},
{
"time": 3,
"holdings": [
{ "owner": "A", "tokens": 3 },
{ "owner": "B", "tokens": 1 },
{ "owner": "C", "tokens": 1 }
]
},
{
"time": 4,
"holdings": [
{ "owner": "C", "tokens": 0 }
]
}
]
tokens
show the current holdings of an owner if the holdings have changed to the last document. I would like to change the collection so that holdings
always includes the full current holdings for any point in time.
At time: 1
, the holdings are: A: 2, B: 1
.
At time: 2
, the holdings are: A: 2, B: 2
. The collections does not include A
‘s holdings however, because they haven’t changed. So what I’d like to get is:
[
{
"time": 1,
"holdings": [
{ "owner": "A", "tokens": 2 },
{ "owner": "B", "tokens": 1 }
]
},
{
"time": 2,
"holdings": [
{ "owner": "A", "tokens": 2 }, // merged from prev doc.
{ "owner": "B", "tokens": 2 }
]
},
{
"time": 3,
"holdings": [
{ "owner": "A", "tokens": 3 },
{ "owner": "B", "tokens": 1 },
{ "owner": "C", "tokens": 1 }
]
},
{
"time": 4,
"holdings": [
{ "owner": "A", "tokens": 3 }, // merged from prev
{ "owner": "B", "tokens": 1 }, // merged from prev
{ "owner": "C", "tokens": 0 }
]
}
]
From what I understand $mergeObjects
does that, but I don’t understand how I can merge all previous docs in order up to the current doc for each doc. So I’m looking for a way to combine setWindowFields
with mergeObjects
I think.
2
Answers
This is a nice challenge.
So far, I got this complicated solution:
$setWindowFields
is used to accumulate this data.$group
by owner and calculate the empty timestamps aswantedTimes
– next 5 steps.$set
empty timestamps withtokens: null
to be filled with actual data and$unwind
to separate – next 3 steps$setWindowFields
to find the last known token for each owner at each timestamp.$group
and format answer:Playground example
From a performance perspective I recommend you split it into 2 calls, the first will be a quick
findOne
just to get the maximum time value in the collection.Once you have that value the pipeline can be much leaner:
This is still quite a heavy query as it requires to
$unwind
and$group
the entire collection, however there is no workaround this due to the requirements. if the collection is too big for this approach I recommend iteration owner by owner, or time by time and doing separate updates accordingly.Mongo Playground
If you don’t care about performance at all and want it in a single query you can still use the same pipeline, you will have to first extract the max time in the collection, this will require you to add an initial
$group
stage, like so: