skip to Main Content

I have a collection that pulls state data from an HRIS on a daily basis. We can use state data for all kinds of interesting things, like predictive analytics.

However, state change data grows boundlessly (currently, that’s another piece of tech debt that we need to tackle). I would like to start however by creating a view that takes only the most recent record from the collection. So I have the following view creation code:

db.rawEmployeeStatus.aggregate().match({dateOfStateCapture:   
                                      db.rawEmployeeStatus.find({})
                                                          .sort({$natural : -1})
                                                          .limit(1)
                                                          .toArray()[0]
                                                          .dateOfStateCapture})      
                                .saveAsView("employeeStatusCurrentState",{dropIfExists:false})

However, the problem with this view, is that it gets most recent record at the time the view is created, then uses that data for the view. When ideally, the data in the view should always be populated from the most recent records in the collection.

3

Answers


  1. I may be wrong, but I’m under the impression that mongo views are always static at the point of creation or refreshing (both standard and on-demand materialised views) based on the source aggregation result.

    If you look at the documentation for on-demand materialised views, there are typically 3 different independent steps

    1. Define the On-Demand Materialized View
    2. Perform Initial Run
    3. Refresh Materialized View

    I believe that you are wanting to refreshing of materalised view to be updated when there is an insert into the collection that it was built from. However, this is not the case. The materialised view is essentially an virtual collection based on an aggregation, it doesn’t know the impact of a change of the source collection(s) data entities that it may have on itself. It would be nice if it does! But your view is normally an optimised subset based so an insert or update may not change the view’s (eg, virtual aggregation result).

    So, in order for your view to be updated after an insert, you have to always follow the last step of refreshing the view (eg, run the aggregation query again). Obviously, doing this after a refresh is fairly aggressive, and certainly not performant.

    In your situation of maintaining a view of the latest state, it may be worth having an application level callback that checks if you have a cause to update your view after each time you do an insert/update. Again, this would depend on what the records look like in your case. If it’s a simple document filter because all updates are inserts, then you could probably refresh on each insert. If the status updates are in an array attribute of a single record, then not all updates may trigger a refresh etc.

    Login or Signup to reply.
  2. If the dateOfStateCapture field is what identifies the most recent document in the collection, then a view such as this might meet your needs:

    db.createView(
      "employeeStatusCurrentState", 
      "rawEmployeeStatus", 
      [
        { $sort: { dateOfStateCapture: -1 } },
        { $limit: 1 }
      ], 
    );
    

    Alternatively, and assuming that you are using automatically generated ObjectIds (which increase in value) for the _id field, then the following view would fit your needs:

    db.createView(
      "employeeStatusCurrentState", 
      "rawEmployeeStatus", 
      [
        { $sort: { _id: -1 } },
        { $limit: 1 }
      ], 
    );
    

    Here is a quick demonstration of the latter working:

    > db.employeeStatusCurrentState.find()
    [ { _id: 3, val: 'c' } ]
    >
    > db.rawEmployeeStatus.insertOne({ _id: 4, val: 'd' })
    { acknowledged: true, insertedId: 4 }
    >
    > db.employeeStatusCurrentState.find()
    [ { _id: 4, val: 'd' } ]
    

    The claim in the other answer that views are static is not really correct. As mentioned in that answer, there is a page demonstrating how to materialize the views via $merge, but this is unrelated to their read-only views. There is small section about comparison between the two here.


    Now let’s dig in a little more on the specifics of the behavior and constraints noted in the question. First revisiting this from the comments:

    What is saveAsView()?

    it creates a view…

    Sure, it is pretty obvious that that’s the general idea. But the specifics here are incredibly important for answering your question. What type of view (e.g. is it materialized) and where is it being created at? If, for example, the utility was saving the result of the query (caching/materializing it),

    saveAsView() is not a helper function provided by MongoDB. The official helper is createView() hence my question.

    Through some additional research, I think saveAsView() is functionality that is offered by the NoSQLBooster for MongoDB (previously "MongoBooster") utility beginning with version 3.3. I downloaded and tried the utility and, per the example below, it looks like it does create a non-materialized view in the database. But the details of that view definition is what seems to be causing the confusion and trouble.

    Working through the steps, I began with this single collection in my database:

    test> show collections
    rawEmployeeStatus
    test> db.rawEmployeeStatus.find()
    [
      { _id: 1, dateOfStateCapture: 1 },
      { _id: 2, dateOfStateCapture: 2 },
      { _id: 3, dateOfStateCapture: 3 }
    ]
    

    I then ran the exact command from the question in NoSQLBooster (version 7.1.16). Afterwards I can see the view in my database:

    test> show collections
    employeeStatusCurrentState  [view]
    rawEmployeeStatus
    

    And if I query it, I get a single document as expected:

    test> db.employeeStatusCurrentState.find()
    [ { _id: 3, dateOfStateCapture: 3 } ]
    test>
    

    Indeed as you mentioned, if I insert a new document into rawEmployeeStatus and then query the view again I still get the previous document with _id of 3:

    test> db.rawEmployeeStatus.insertOne({_id:4, dateOfStateCapture:4})
    { acknowledged: true, insertedId: 4 }
    test> db.employeeStatusCurrentState.find()
    [ { _id: 3, dateOfStateCapture: 3 } ]
    

    We can see why when we take a look at the view definition:

    test> db.getCollectionInfos({name:"employeeStatusCurrentState"})
    [
      {
        name: 'employeeStatusCurrentState',
        type: 'view',
        options: {
          viewOn: 'rawEmployeeStatus',
          pipeline: [ { '$match': { dateOfStateCapture: 3 } } ]
        },
        info: { readOnly: true }
      }
    ]
    

    Specifically, look at the pipeline that defines the view:

    [ { '$match': { dateOfStateCapture: 3 } } ]
    

    What has happened here is that the inner portion of your command was executed to resolve to a value prior to being sent to the database for storage in the view. So this:

    db.rawEmployeeStatus.find({})
                                                              .sort({$natural : -1})
                                                              .limit(1)
                                                              .toArray()[0]
                                                              .dateOfStateCapture
    

    Got resolved to just 3 (in my example) and is what got stored in the view definition. This was performed by the NoSQLBooster client as something like .toArray()[0] is not valid aggregation syntax and therefore is not valid for storage in a view.


    To summarize – classic views in MongoDB are not static (materialized). In your particular case, the third party tooling being used created the view in a non-obvious way. This required it to resolve some of the definition to a static value which was provided as part of the view definition itself. Alternative syntax to creating a view that will fit your needs (or at least point you in the right direction) is provided at the top of this answer.

    Login or Signup to reply.
  3. You can use a change stream to listen for changes to the collection and update the view accordingly. Here is a link:
    https://docs.mongodb.com/manual/changeStreams/

    This is because the view creation code you have above only looks at the most recent record at the time the view is created, and does not update the view dynamically as new records are added to the collection.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search