Azure Alerts - ScheduledQueryRule - Set Time Range in Query only

DataBach
January 11, 2024
212 views
0 votes
2 Answers

I am currently working on an Alert Rule in Log Analytics which is supposed to look back for 90 days, calculate a confidencelimit over that time and then compare today’s value with the limit. If today’s value exceeds the statistic, then the alert rule should fire.

Something like so (pseudo code):

let jobDurationTbl = latencyTbl
    | where name_s contains "latency_metrics" and TimeGenerated >= (now() - 90d)

let statisticsTbl = jobDurationTbl
    | summarize count(), upperConfidenceLimit = avg(jobDuration_ms) + 2 * stdev(jobDuration_ms) by serviceName_s

let alertTbl = latencyTbl
    | join kind=leftouter (jobDurationTbl) on serviceName_s
    | project
        serviceName_s,
        _ResourceId,
        TimeGenerated
    | join kind=leftouter (statisticsTbl) on serviceName_s
    | where jobDuration_ms > upperConfidenceLimit
    | where TimeGenerated == today;

Please note that I can not post the whole query here due to its length and complexity. The idea is that we have a table jobDurationTbl which contains the length of jobs in ms (jobDuration_ms) for each service. Then we create an alertTbl which should only return rows if the jobDuration_ms exceeds the upperConfidenceLimit calculated from the statisticsTbl

My problem is that the 90 days look back keeps getting overridden by the windowSize in the following line: TimeGenerated >= (now() - 90d). It is not clear what Log Analytics does, but I think it replaces the whole (now() - 90d) with the windowSize. I tried different syntaxes like:

YourTable
| where TimeGenerated >= ago(90d)

But still the time range gets overriden by the windowSize. Does someone know how the Microsoft.Insights scheduledQueryRule should be configured in such a case? I want to specify the windowing functionality solely in the query without the interference of any parameters. The evaluation period should be simply once a day.

Answers

Chosen as BEST ANSWER

Ok, I found a solution in the end. Thank you @Jahnavi you gave the missing clue with the binning logic. The final query with sample data looks as follows:

let Logs = datatable (TimeGenerated:datetime, ServiceName:string, jobDuration:int, _ResourceId:string ) [
   "1/4/2023, 2:33:36.000 PM", "Service A", 8945849, "/subscriptions/**************/resourcegroups/rg-******-prd-ingestion-landing/providers/microsoft.databricks/workspaces/dbw-******-prd-ingld",
   "12/4/2023, 2:33:36.000 PM", "Service A", 14, "/subscriptions/**************/resourcegroups/rg-******-prd-ingestion-landing/providers/microsoft.databricks/workspaces/dbw-******-prd-ingld",
   "12/5/2023, 3:33:36.000 PM", "Service B", 50, "/subscriptions/**************/resourcegroups/rg-******-prd-ingestion-landing/providers/microsoft.databricks/workspaces/dbw-******-prd-ingld",
   "12/5/2023, 3:33:36.000 PM", "Service B", 800, "/subscriptions/**************/resourcegroups/rg-******-prd-ingestion-landing/providers/microsoft.databricks/workspaces/dbw-******-prd-ingld",
   "12/6/2023, 1:33:36.000 PM", "Service C", 34982334, "/subscriptions/**************/resourcegroups/rg-******-prd-ingestion-landing/providers/microsoft.databricks/workspaces/dbw-******-ingld",
   "1/5/2024, 2:33:36.000 PM", "Service B", 45, "/subscriptions/**************/resourcegroups/rg-******-prd-ingestion-landing/providers/microsoft.databricks/workspaces/dbw-******-prd-ingld",
   "1/5/2024, 3:33:36.000 PM", "Service B", 5000, "/subscriptions/**************/resourcegroups/rg-******-prd-ingestion-landing/providers/microsoft.databricks/workspaces/dbw-******-prd-ingld"
] ;

let todaysRunTbl = Logs | where TimeGenerated >= startofday(now());
//todaysRunTbl

let statistiscsTbl = Logs
| where TimeGenerated <= startofday(now())
| where TimeGenerated > ago(90d)
| summarize upperConfidenceLimit = avg(jobDuration) + 3 * stdev(jobDuration) by bin(TimeGenerated, 90d), ServiceName
| project ServiceName, upperConfidenceLimit;


let alertTbl = todaysRunTbl | join kind=inner statistiscsTbl on ServiceName
| where jobDuration >= upperConfidenceLimit;
alertTbl

If today were the 1/5/2024, we would get the right result provided we have parameterized the alert correctly. The right result is the last row of the datatable. For reference, if we want to check every 15 minutes if there was an error in the last 1h, we would configure the alert as follows (bicep):

param windowSize = 'PT1H'
param evaluationFrequency = 'PT15M'

param timeAggregation = 'Count'

param operator = 'GreaterThanOrEqual'
param threshold = 1

param dimensions  = [
  {
    name: 'jobDuration'
    operator: 'Include'
    values: [ '*' ]
  }
  {
    name: 'upperConfidenceLimit'
    operator: 'Include'
    values: [ '*' ]
  }
]

param numberOfEvaluationPeriods = 1
param minFailingPeriodsToAlert = 1

We would get an alert for every row returned by the query.

(Edit)

After a workaround on your issue, below are the results using Microsoft.Insights/scheduledQueryRules as well as KQL query.

Using Microsoft.Insights/scheduledQueryRules bicep resource provider, you can use below code to create a scheduled query rule.

param alert string = 'myjalert'
param location string 
param Description string = 'This is a metric alert'
param Severity int = 3
param isEnabled bool = true
param autoMitigate bool = true
param resourceId string = '/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Compute/virtualMachines/xxx'
param query string = 'Perf | where ObjectName == "Processor" and CounterName == "% Processor Time"'
param metricMeasureColumn string = 'AggregatedValue'
param operator string = 'GreaterThan'
param threshold int = 0
param numberOfEvaluationPeriods int = 1
param minFailingPeriodsToAlert int = 1
param timeAggregation string = 'Average'
param windowSize string = 'PT5M'
param actionGroupId string = '/subscriptions/xx/resourceGroups/xxx/providers/microsoft.insights/actiongroups/xx'

resource alert 'Microsoft.Insights/scheduledQueryRules@2021-08-01' = {
  name: alert
  location: location
  tags: {}
  properties: {
    description: Description
    severity: Severity
    enabled: isEnabled
    scopes: [
      resourceId
    ]
    evaluationFrequency: evaluationFrequency
    windowSize: windowSize
    criteria: {
      allOf: [
        {
          query: query
          metricMeasureColumn: metricMeasureColumn
          dimensions: []
          operator: operator
          threshold: threshold
          timeAggregation: timeAggregation
          failingPeriods: {
            numberOfEvaluationPeriods: numberOfEvaluationPeriods
            minFailingPeriodsToAlert: minFailingPeriodsToAlert
          }
        }
      ]
    }
    autoMitigate: autoMitigate
    actions: {
      actionGroups: [
         actionGroupId
      ]
      customProperties: {
        key1: 'xx'
        key2: 'xx'
      }
    }
  }
}

Refer here for more relevant MS Doc sample Bicep templates.

Modify your KQL query by using avg() aggregated function to meet the expected requirement as shown.

YourTable 
| where TimeGenerated >= ago(90d) 
| summarize value = avg(<Metric>) by bin(TimeGenerated, 1d) 
| project TimeGenerated, value

Please signup or login to give your own answer.

Click here to cancel reply.

Azure Alerts – ScheduledQueryRule – Set Time Range in Query only

Answers