The problem we are experiencing is a service bus queue trigger function app, on a rare occasion (99.999% uptime), stops performing its job. The monitor aspect of the function app just stops working. The function app shows as running. We have found no errors to explain why the function app does not recognize new messages in the service bus queue (function app logs, application insights, service bus logs, etc.). Restarting the function app processes the messages in the queue.
We have seen this behavior in both our production and Uat/testing environments for ~2 of our service bus trigger function apps; however, we have other function apps that are the same trigger type that have yet to exhibit this behavior. The only difference between the two environments is that we use a premium service bus for production.
So, the million dollar question is why the function apps stop seeing new messages in the queues they are monitoring, until being restarted, given that they are not on a consumption plan and they are configured to always be on?
Production:
Function App:
Runtime Version: ~4
.Net Version: .NET 6 (LTS) Isolated (I know, we are going to 8 soon. 🙂 )
Type – Service Bus Trigger
Always On Setting – True
Number of functions – 1
Storage account – specific only to the function app.
App Service Plan:
Type - P2v3
# of Apps - 27
Service Bus:
Type - Premium
Queue monitored:
Session enabled: false
Current ticket exists with Microsoft, active for the last two days, but they do not have a solution at this point. Interaction with their support team members, thus far, have confirmed that our setup is coded and implemented/configured correctly.
2
Answers
Issue:
Why a problem occurred:
Mitigation:
Running with at least two instances of the app service plan:
schedule Microsoft has in place, where it will ensure not more than
one instance is being updated at the same time.
Using Azure Health Check:
the queue, this is not enough because it may be a case where the
processing of the messages are time sensitive enough that relying on the Health Check to move to another instance, because the current is considered unhealthy, is not optimal.
Code Solution:
Azure Solution:
Conclusion:
So, in essence, our problem was that an update occurred via Microsoft maintenance, but our environment was not capable of dealing with whether a problem arose due to the update. Now, there is no way to always be 100% accurate in dealing with updates, but, by having at least two instances active we should be able to eliminate future problems, relative to maintenance updates. And, I am exploring using an Azure Monitor alert to inform us if messages stay in the queue longer than what we would expect. I'll explore the code solution if using Azure Monitor does not work for our case.
Shout-out/Response to others:
Finally, thank you Vivek, for your suggestions. In this case, a timer trigger, just to see if the queue monitoring function app was idle/not running, would not work because the function app is set to always be running, and it was active, but it just lost the ability to see new messages in the queue for processing.
Always On
option is removed fromPremium plan
it is now only available onApp Service Plan
in Azure Functions.Which is resulting your function to go idle and not triggering the Service Bus trigger.
To keep your function always on you can use a trigger as warm up, which keeps function always on. Best option is
Timer Trigger
.For Premium plan reference, Check this MS Document
OR
You can use
App Service Plan
and enableAlways On
option.For reference check this SO link