MongoDB fill missing dates in aggregation pipeline

Vaggelis
October 22, 2022
157 views
1 vote
2 Answers

I have this pipeline :

    let pipeline = [
      {
        $match: {
          date: { $gte: new Date("2022-10-19"), $lte: new Date("2022-10-26") },
        },
      },

      {
        $group: {
          _id: "$date",
          tasks: { $push: "$$ROOT" },
        },
      },
      {
        $sort: { _id: -1 },
      },
    ];

    const aggregationData = await ScheduleTaskModel.aggregate(pipeline);

where i group all "tasks" between a date range by date and i get that result :

[
    {
        "date": "2022-10-21T00:00:00.000Z",
        "tasks": [...tasks with this date]
    },
    {
        "date": "2022-10-20T00:00:00.000Z",
        "tasks": [...tasks with this date]
    }
]

as you see i have "tasks" only for 2 dates in that range,what if i want all dates to appear even the ones with no tasks so it would be like this with empty arrays ?

[
    {
        "date": "2022-10-26T00:00:00.000Z",
        "tasks": []
    },
    {
        "date": "2022-10-25T00:00:00.000Z",
        "tasks": []
    },
    {
        "date": "2022-10-24T00:00:00.000Z",
        "tasks": []
    },
    {
        "date": "2022-10-23T00:00:00.000Z",
        "tasks": []
    },
    {
        "date": "2022-10-22T00:00:00.000Z",
        "tasks": []
    },
    {
        "date": "2022-10-21T00:00:00.000Z",
        "tasks": [...tasks with this date]
    },
    {
        "date": "2022-10-20T00:00:00.000Z",
        "tasks": [...tasks with this date]
    },
    {
        "date": "2022-10-19T00:00:00.000Z",
        "tasks": []
    },
]

i tried to use $densify but unfortunately it requires upgrading my mongoDb atlas cluster which is not possible..

Answers

New function $densify would be the simplest, of course. The manual way of doing it would be this one:

db.collection.aggregate([
  {
    $group: {
      _id: null,
      data: { $push: "$$ROOT" }
    }
  },
  {
    $set: {
      dates: {
        $map: {
          input: { $range: [ 0, 8 ] }, // maybe more dynamic with $dateDiff -> { $dateDiff: { startDate: new Date("2022-10-19"), endDate: new Date("2022-10-26") }, unit: "day" } }
          in: {
            date: {
              $dateAdd: {
                startDate: ISODate("2022-10-19T00:00:00.000Z"),
                unit: "day",
                amount: "$$this"
              }
            }
          }
        }
      }
    }
  },
  {
    $set: {
      dates: {
        $map: {
          input: "$dates",
          as: "d",
          in: {
            $mergeObjects: [
              "$$d",
              {
                tasks: {
                  $filter: {
                    input: "$data",
                    cond: { $eq: [ "$$d.date", "$$this.date" ] }
                  }
                }
              }
            ]
          }
        }
      }
    }
  },
  {
    $project: {
      data: {
        $map: {
          input: "$dates",
          in: {
            $cond: {
              if: { $eq: [ "$$this.tasks", [] ] },
              then: "$$this",
              else: { $first: "$$this.tasks" }
            }
          }
        }
      }
    }
  },
  { $unwind: "$data" },
  { $replaceWith: "$data" }
])

Mongo Playground

- nimrodserok
- October 23, 2022 at 8:56 am
- 0 votes
0
The answer of @WernfriedDomscheitAnother has a downside of grouping together all the documents in the collection, creating one large document, while a document has a size limit. A variation on it, without this downside, can be:
1. $match only the relevant document, same as in your current query
2. Use $facet to handle the case of no relevant documents at all. This will allow you to group all the relevant documents as you did in your query, but to keep a working-document even if there are any.
3. Add the relevant dates inside an array (since we use $facet this will happen even if the first match is empty)
4. Concatenate the array of matched data with the array of empty entries, use the real-data first.
5. $unwind the separate the documents by date, and $group again by date to remove the duplicates.
6. Format the result
```
db.collection.aggregate([
  {$match: {date: {$gte: new Date("2022-10-19"), $lte: new Date("2022-10-26")}}},
  {$facet: {
      data: [
        {$group: {_id: "$date", tasks: {$push: "$$ROOT"}}},
        {$project: {date: "$_id", tasks: 1}}
      ]
  }},
  {$addFields: {
      dates: {$map: {
          input: {$range: [0, 8]},
          // maybe more dynamic with $dateDiff -> { $dateDiff: { startDate: new Date("2022-10-19"), endDate: new Date("2022-10-26") }, unit: "day" } }
          in: {
            date: {$dateAdd: {
                startDate: ISODate("2022-10-19T00:00:00.000Z"),
                unit: "day",
                amount: "$$this"
            }},
            tasks: []
          }
      }}
  }},
  {$project: {data: {$concatArrays: ["$data", "$dates"]}}},
  {$unwind: "$data"},
  {$group: {_id: "$data.date", "tasks": {$first: "$data.tasks"}}},
  {$project: { _id: 0, date: "$_id", tasks: 1 }},
  {$sort: { date: -1 }},
])
```
See how it works on the playground example
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.