skip to Main Content

I am currently working on an Alert Rule in Log Analytics which is supposed to look back for 90 days, calculate a confidencelimit over that time and then compare today’s value with the limit. If today’s value exceeds the statistic, then the alert rule should fire.

Something like so (pseudo code):

let jobDurationTbl = latencyTbl
    | where name_s contains "latency_metrics" and TimeGenerated >= (now() - 90d)

let statisticsTbl = jobDurationTbl
    | summarize count(), upperConfidenceLimit = avg(jobDuration_ms) + 2 * stdev(jobDuration_ms) by serviceName_s

let alertTbl = latencyTbl
    | join kind=leftouter (jobDurationTbl) on serviceName_s
    | project
        serviceName_s,
        _ResourceId,
        TimeGenerated
    | join kind=leftouter (statisticsTbl) on serviceName_s
    | where jobDuration_ms > upperConfidenceLimit
    | where TimeGenerated == today;

Please note that I can not post the whole query here due to its length and complexity. The idea is that we have a table jobDurationTbl which contains the length of jobs in ms (jobDuration_ms) for each service. Then we create an alertTbl which should only return rows if the jobDuration_ms exceeds the upperConfidenceLimit calculated from the statisticsTbl

My problem is that the 90 days look back keeps getting overridden by the windowSize in the following line: TimeGenerated >= (now() - 90d). It is not clear what Log Analytics does, but I think it replaces the whole (now() - 90d) with the windowSize. I tried different syntaxes like:

YourTable
| where TimeGenerated >= ago(90d)

But still the time range gets overriden by the windowSize. Does someone know how the Microsoft.Insights scheduledQueryRule should be configured in such a case? I want to specify the windowing functionality solely in the query without the interference of any parameters. The evaluation period should be simply once a day.

2

Answers


  1. Chosen as BEST ANSWER

    Ok, I found a solution in the end. Thank you @Jahnavi you gave the missing clue with the binning logic. The final query with sample data looks as follows:

    let Logs = datatable (TimeGenerated:datetime, ServiceName:string, jobDuration:int, _ResourceId:string ) [
       "1/4/2023, 2:33:36.000 PM", "Service A", 8945849, "/subscriptions/**************/resourcegroups/rg-******-prd-ingestion-landing/providers/microsoft.databricks/workspaces/dbw-******-prd-ingld",
       "12/4/2023, 2:33:36.000 PM", "Service A", 14, "/subscriptions/**************/resourcegroups/rg-******-prd-ingestion-landing/providers/microsoft.databricks/workspaces/dbw-******-prd-ingld",
       "12/5/2023, 3:33:36.000 PM", "Service B", 50, "/subscriptions/**************/resourcegroups/rg-******-prd-ingestion-landing/providers/microsoft.databricks/workspaces/dbw-******-prd-ingld",
       "12/5/2023, 3:33:36.000 PM", "Service B", 800, "/subscriptions/**************/resourcegroups/rg-******-prd-ingestion-landing/providers/microsoft.databricks/workspaces/dbw-******-prd-ingld",
       "12/6/2023, 1:33:36.000 PM", "Service C", 34982334, "/subscriptions/**************/resourcegroups/rg-******-prd-ingestion-landing/providers/microsoft.databricks/workspaces/dbw-******-ingld",
       "1/5/2024, 2:33:36.000 PM", "Service B", 45, "/subscriptions/**************/resourcegroups/rg-******-prd-ingestion-landing/providers/microsoft.databricks/workspaces/dbw-******-prd-ingld",
       "1/5/2024, 3:33:36.000 PM", "Service B", 5000, "/subscriptions/**************/resourcegroups/rg-******-prd-ingestion-landing/providers/microsoft.databricks/workspaces/dbw-******-prd-ingld"
    ] ;
    
    let todaysRunTbl = Logs | where TimeGenerated >= startofday(now());
    //todaysRunTbl
    
    let statistiscsTbl = Logs
    | where TimeGenerated <= startofday(now())
    | where TimeGenerated > ago(90d)
    | summarize upperConfidenceLimit = avg(jobDuration) + 3 * stdev(jobDuration) by bin(TimeGenerated, 90d), ServiceName
    | project ServiceName, upperConfidenceLimit;
    
    
    let alertTbl = todaysRunTbl | join kind=inner statistiscsTbl on ServiceName
    | where jobDuration >= upperConfidenceLimit;
    alertTbl
    

    If today were the 1/5/2024, we would get the right result provided we have parameterized the alert correctly. The right result is the last row of the datatable. For reference, if we want to check every 15 minutes if there was an error in the last 1h, we would configure the alert as follows (bicep):

    param windowSize = 'PT1H'
    param evaluationFrequency = 'PT15M'
    
    param timeAggregation = 'Count'
    
    param operator = 'GreaterThanOrEqual'
    param threshold = 1
    
    param dimensions  = [
      {
        name: 'jobDuration'
        operator: 'Include'
        values: [ '*' ]
      }
      {
        name: 'upperConfidenceLimit'
        operator: 'Include'
        values: [ '*' ]
      }
    ]
    
    param numberOfEvaluationPeriods = 1
    param minFailingPeriodsToAlert = 1
    

    We would get an alert for every row returned by the query.


  2. After a workaround on your issue, below are the results using Microsoft.Insights/scheduledQueryRules as well as KQL query.

    Using Microsoft.Insights/scheduledQueryRules bicep resource provider, you can use below code to create a scheduled query rule.

    param alert string = 'myjalert'
    param location string 
    param Description string = 'This is a metric alert'
    param Severity int = 3
    param isEnabled bool = true
    param autoMitigate bool = true
    param resourceId string = '/subscriptions/xxx/resourceGroups/xxx/providers/Microsoft.Compute/virtualMachines/xxx'
    param query string = 'Perf | where ObjectName == "Processor" and CounterName == "% Processor Time"'
    param metricMeasureColumn string = 'AggregatedValue'
    param operator string = 'GreaterThan'
    param threshold int = 0
    param numberOfEvaluationPeriods int = 1
    param minFailingPeriodsToAlert int = 1
    param timeAggregation string = 'Average'
    param windowSize string = 'PT5M'
    param actionGroupId string = '/subscriptions/xx/resourceGroups/xxx/providers/microsoft.insights/actiongroups/xx'
    
    resource alert 'Microsoft.Insights/scheduledQueryRules@2021-08-01' = {
      name: alert
      location: location
      tags: {}
      properties: {
        description: Description
        severity: Severity
        enabled: isEnabled
        scopes: [
          resourceId
        ]
        evaluationFrequency: evaluationFrequency
        windowSize: windowSize
        criteria: {
          allOf: [
            {
              query: query
              metricMeasureColumn: metricMeasureColumn
              dimensions: []
              operator: operator
              threshold: threshold
              timeAggregation: timeAggregation
              failingPeriods: {
                numberOfEvaluationPeriods: numberOfEvaluationPeriods
                minFailingPeriodsToAlert: minFailingPeriodsToAlert
              }
            }
          ]
        }
        autoMitigate: autoMitigate
        actions: {
          actionGroups: [
             actionGroupId
          ]
          customProperties: {
            key1: 'xx'
            key2: 'xx'
          }
        }
      }
    }
    

    enter image description here

    Refer here for more relevant MS Doc sample Bicep templates.

    Or

    Modify your KQL query by using avg() aggregated function to meet the expected requirement as shown.

    YourTable 
    | where TimeGenerated >= ago(90d) 
    | summarize value = avg(<Metric>) by bin(TimeGenerated, 1d) 
    | project TimeGenerated, value
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search