skip to Main Content

I have a large set of records in this schema:

{
   username: "[email protected]",
   timestamp: 1646006400000 //unix timestamp in milliseconds
   amount: 100
}

Given an arbitrary range of dates, I need to find the change in average amount value vs a similar prior range of dates.

For example, if the input is March, 10 to March, 15, I need to calculate the average in that period and the difference with the average in the March, 5 to March 10 period.

This is how I planned it:

  1. given the from_date & to_date, calculate the interval and subtract to get prior_from_date & prior_to_date
  2. match all the records from prior_from_date to to_date
  3. use setWindowFields to create a flag prior on the prior records
  4. group by username and then group by prior flag to calculate averages

I can compute step 1 on the app side (PHP). I’m able to get the average using $avg but am unable to figure out how to do step 3. This is what I have so far:

{
  partitionBy: '$username',
  sortBy: {
    timestamp: 1
  },
  output: {
    prior: {
      $sum: '$prior',
      window: {
        range: [
          -86400000, // milliseconds in a day - may be multiply by interval from PHP?
          0
        ]
      }
    }
  }
}

This is the final desired output:

{
 {
   username:"[email protected]",
   prior: {
      average: 50
   },
   current: {
      average: 74
   }
 },
 {
   username:"[email protected]",
   prior: {
       average: 73
   },
   current: {
       average: 33
   }
 }
}

2

Answers


  1. Since you are using a $match step and you already have the timestamps calculated, you can just group using $addFields with a condition to get the same effect:

    db.collection.aggregate([
      {
        $match: {
          timestamp: {
            $gte: 1646438400000,
            $lt: 1647302400000
          }
        }
      },
      {
        "$addFields": {
          "period": {
            "$cond": [
              {$gt: ["$timestamp", 1646870400000]},
              "current",
              "prior"
            ]
          }
        }
      },
      {
        $group: {
          _id: {period: "$period", username: "$username"},
          average: {$avg: "$amount"}
        }
      },
      {
        $project: {
          username: "$_id.username",
          "data.k": "$_id.period",
          "data.v": {average: "$average"},
          _id: 0
        }
      },
      {
        $group: {_id: "$username",
          data: {$push: "$data"}
        }
      },
      {
        $project: {
          "data": {"$arrayToObject": "$data"}
        }
      },
      {
        $project: {
          _id: 1,
          current: "$data.current",
          prior: "$data.prior"
        }
      }
    ])
    

    As you can see on the playground :
    which returns this:

        [
      {
        "_id": "[email protected]",
        "current": {
          "average": 42.5
        },
        "prior": {
          "average": 68
        }
      },
      {
        "_id": "[email protected]",
        "current": {
          "average": 55
        },
        "prior": {
          "average": 37
        }
      }
    ]
    
    Login or Signup to reply.
  2. Your requirements are not so clear to me, but could be this one:

    db.collection.aggregate([
       { $set: { timestamp: { $toDate: "$timestamp" } } },
       {
          $setWindowFields: {
             partitionBy: "$username",
             sortBy: { timestamp: 1 },
             output: {
                average: {
                   $avg: "$amount",
                   window: { range: [-5, 5], unit: "day" }
                }
             }
          }
       }
    ])
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search