skip to Main Content

I’m trying to get a list of current holders at specific times from a collection. My collection looks like this:

[
  {
    "time": 1,
    "holdings": [
      { "owner": "A", "tokens": 2 },
      { "owner": "B", "tokens": 1 }
    ]
  },
  {
    "time": 2,
    "holdings": [
      { "owner": "B", "tokens": 2 }
    ]
  },
  {
    "time": 3,
    "holdings": [
      { "owner": "A", "tokens": 3 },
      { "owner": "B", "tokens": 1 },
      { "owner": "C", "tokens": 1 }
    ]
  },
  {
    "time": 4,
    "holdings": [
      { "owner": "C", "tokens": 0 }
    ]
  }
]

tokens show the current holdings of an owner if the holdings have changed to the last document. I would like to change the collection so that holdings always includes the full current holdings for any point in time.

At time: 1, the holdings are: A: 2, B: 1.
At time: 2, the holdings are: A: 2, B: 2. The collections does not include A‘s holdings however, because they haven’t changed. So what I’d like to get is:

[
  {
    "time": 1,
    "holdings": [
      { "owner": "A", "tokens": 2 },
      { "owner": "B", "tokens": 1 }
    ]
  },
  {
    "time": 2,
    "holdings": [
      { "owner": "A", "tokens": 2 },  // merged from prev doc.
      { "owner": "B", "tokens": 2 }
    ]
  },
  {
    "time": 3,
    "holdings": [
      { "owner": "A", "tokens": 3 },
      { "owner": "B", "tokens": 1 },
      { "owner": "C", "tokens": 1 }
    ]
  },
  {
    "time": 4,
    "holdings": [
      { "owner": "A", "tokens": 3 },  // merged from prev
      { "owner": "B", "tokens": 1 },  // merged from prev
      { "owner": "C", "tokens": 0 }
    ]
  }
]

From what I understand $mergeObjects does that, but I don’t understand how I can merge all previous docs in order up to the current doc for each doc. So I’m looking for a way to combine setWindowFields with mergeObjects I think.

2

Answers


  1. This is a nice challenge.

    So far, I got this complicated solution:

    1. Get all of our timestamps in all of our documents. This is the purpose of the first 4 steps. $setWindowFields is used to accumulate this data.
    2. $group by owner and calculate the empty timestamps as wantedTimes– next 5 steps.
    3. $set empty timestamps with tokens: null to be filled with actual data and $unwind to separate – next 3 steps
    4. Use $setWindowFields to find the last known token for each owner at each timestamp.
    5. Fill this last known state for documents with unknown token – 2 steps
    6. $group and format answer:
    db.collection.aggregate([
      {
        $setWindowFields: {
          sortBy: {time: 1},
          output: {
            allTimes: {$addToSet: "$time", window: {documents: ["unbounded", "current"]}
            }
          }
        }
      },
      {
        $setWindowFields: {
          sortBy: {time: -1},
          output: {
            allTimes: {$addToSet: "$allTimes", window: {documents: ["unbounded", "current"]}
            }
          }
        }
      },
      {
        $set: {
          allTimes: {
            $reduce: {
              input: "$allTimes",
              initialValue: [],
              in: {"$concatArrays": ["$$value", "$$this"]}
            }
          }
        }
      },
      {$set: {allTimes: {$setIntersection: "$allTimes"}}},
      {$unwind: "$holdings"},
      {$sort: {time: 1}},
      {$group: { _id: "$holdings.owner",
          tokens: {$push: {tokens: "$holdings.tokens", time: "$time"}},
          times: {$push: "$time"}, firstTime: {$first: "$time"},
          allTimes: {$first: "$allTimes"}}
      },
      {
        $addFields: {
          wantedTimes: {
            $filter: {
              input: "$allTimes",
              as: "item",
              cond: {$gte: ["$$item", "$firstTime"]}
            }
          }
        }
      },
      {
        $project: {
          tokens: 1,
          wantedTimes: {$setDifference: ["$wantedTimes", "$times"]}
        }
      },
      {
        $set: {
          data: {
            $map: {
              input: "$wantedTimes",
              as: "item",
              in: {time: "$$item", tokens: null}
            }
          }
        }
      },
      {$project: {tokens: {"$concatArrays": ["$tokens", "$data"]}}},
      {$unwind: "$tokens"},
      {
        $setWindowFields: {
          partitionBy: "$_id",
          sortBy: {"tokens.time": 1},
          output: {
            lastTokens: {
              $push: "$tokens.tokens",
              window: {documents: ["unbounded", "current"]}
            }
          }
        }
      },
      {
        $set: {
          lastTokens: {
            $filter: {
              input: "$lastTokens",
              as: "item",
              cond: {$ne: ["$$item", null]}
            }
          }
        }
      },
      {
        $set: {
          "tokens.tokens": {$ifNull: ["$tokens.tokens", {$last: "$lastTokens"}]}
        }
      },
      {
        $group: {
          _id: "$tokens.time",
          holdings: {$push: {owner: "$_id", tokens: "$tokens.tokens" }}
        }
      },
      {$project: {time: "$_id", holdings: 1, _id: 0}},
      {$sort: {time: 1}}
    ])
    

    Playground example

    Login or Signup to reply.
  2. From a performance perspective I recommend you split it into 2 calls, the first will be a quick findOne just to get the maximum time value in the collection.

    Once you have that value the pipeline can be much leaner:

    const maxItem = await db.collection.findOne({}).sort({ time: -1 });
    
    db.collection.aggregate([
      {
        $unwind: "$holdings"
      },
      {
        $group: {
          _id: "$holdings.owner",
          times: {
            $push: {
              time: "$time",
              tokens: "$holdings.tokens"
            }
          },
          minTime: {
            $min: "$time"
          }
        }
      },
      {
        $addFields: {
          times: {
            $reduce: {
              input: {
                $range: [
                  "$minTime",
                  maxItem.time + 1 // this is max time
                ]
              },
              initialValue: {
                values: [],
                lastIndex: 0
              },
              in: {
                values: {
                  "$concatArrays": [
                    "$$value.values",
                    [
                      {
                        $cond: [
                          {
                            $in: [
                              "$$this",
                              "$times.time"
                            ]
                          },
                          {
                            "$arrayElemAt": [
                              "$times",
                              "$$value.lastIndex"
                            ]
                          },
                          {
                            "$mergeObjects": [
                              {
                                tokens: 0
                              },
                              {
                                "$arrayElemAt": [
                                  "$times",
                                  {
                                    $subtract: [
                                      "$$value.lastIndex",
                                      1
                                    ]
                                  }
                                ]
                              },
                              {
                                time: "$$this"
                              }
                            ]
                          }
                        ]
                      }
                    ]
                  ]
                },
                lastIndex: {
                  $cond: [
                    {
                      $in: [
                        "$$this",
                        "$times.time"
                      ]
                    },
                    {
                      $sum: [
                        "$$value.lastIndex",
                        1
                      ]
                    },
                    "$$value.lastIndex"
                  ]
                }
              }
            }
          }
        }
      },
      {
        $unwind: "$times.values"
      },
      {
        $group: {
          _id: "$times.values.time",
          holdings: {
            $push: {
              owner: "$_id",
              tokens: "$times.values.tokens"
            }
          }
        }
      },
      {
        $project: {
          _id: 0,
          time: "$_id",
          holdings: 1
        }
      },
      {
        $sort: {
          time: 1
        }
      }
    ])
    

    This is still quite a heavy query as it requires to $unwind and $group the entire collection, however there is no workaround this due to the requirements. if the collection is too big for this approach I recommend iteration owner by owner, or time by time and doing separate updates accordingly.

    Mongo Playground

    If you don’t care about performance at all and want it in a single query you can still use the same pipeline, you will have to first extract the max time in the collection, this will require you to add an initial $group stage, like so:

    db.collection.aggregate([
      {
        $group: {
          _id: null,
          maxTime: {
            $max: "$time"
          },
          roots: {
            $push: "$$ROOT"
          }
        }
      },
      {
        $unwind: "$roots"
      },
      {
        $replaceRoot: {
          newRoot: {
            "$mergeObjects": [
              "$roots",
              {
                maxTime: "$maxTime"
              }
            ]
          }
        }
      },
      ... same pipeline ...
    ])
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search