So here is the scenario:
- We have a m_reading table in which we will ingest 24 readings every day.
- Out of 24 readings, 23 are of current day whereas 1 is of previous day.
- materialised view is supposed to aggregate the days data.
- Business user is okay a partial aggregation for the current day, but once the data is available the next day it has to be refreshed.
Here we are planning to use backfill property.
From the documentation We see that we can use effectiveDateTime
But in the examples we see that the effectiveDateTime is always hard-coded.
e.g.
.create async materialized-view with (backfill=true, effectiveDateTime=datetime(2019-01-01)) CustomerUsage on table T
{
T
| extend Day = bin(Timestamp, 1d)
| summarize count(), dcount(User), max(Duration) by Customer, Day
}
Can we have that passed from somewhere or even be dependent on the current load behaviour.
As in our example if the data currently being loaded is for 28-Apr-2023, effectiveDateTime should be 27-Apr-2023.
Thanks in advance for the pointers.
===================================
Update1:
Thanks @Anvera for the document
Based on response from Anvera, i conclude that:
- Backfill is only applicable while creating the MV
- MV has the aggregation done on the newly ingested data (delta data). reference here.
This brings me to the question, how to force the MV to refresh (or re-aggregate) the data taking into account already loaded (yesterday’s data in our case) into account as well
2
Answers
In your scenario, these properties are not applicable. These properties are needed for the creation of the materialized view. When creating the materialized view, you can decide whether you want to "materialize" the data that already exist in the source table. If yes, you specify the property "backfill=true", which will materialize all of the data in the table. If you want to limit it to a specific time, you can also specify the effectiveDateTime property.
Please note that all the historical data must be in the cache, so you will need to make sure that you adjust the caching policy appropriately.
See more details in this section of the docs
Regarding the case when the data arrives late, then as Yifat said, it will be aggregated correctly based on the view definition, if the aggregation is based on the Timestamp column, then regardless of when the data arrived, it will update the correct aggregation record that belongs to the specific Day that the Timestamp belongs to.
The materialized view will always aggregate the data according to the view definition. If there is late arriving data for a previous day that has already been materialized (aggregated), the materialization process will update that day with the new data. You don’t need to force it to re-aggregate, it will happen automatically.