skip to Main Content

Azure Stream Analytics is used to distribute JSON data incoming from azure IoT hub into tables of SQL database.

Recently there were several large watermark delays which totally ruined the pipeline.

When it works normally it has a delay of 12 seconds which is more or less acceptable.

The overall architecture is as follows: the IoT edge devices send JSON messages when there are some to IoT hub, each message then follows to Stream Analytics as an input, there is a query running on Stream Analytics, the outputs are SQL db tables; the SQL db handles data to Grafana when requested, there are several queries from Grafana dashboards for monitoring purposes. Hence there are writes by SA and read from Grafana (sometimes from Azure Data studio).

Observations of metrics have shown that:

  • Number of messages sent from IoT edge devices don’t affect the delay
  • The delays happen when there are so many sessions (the number of sessions when the delay is 12 sec is 8, larger than 8 sessions bring to a delay and increased number of sessions, e.g., 130 sessions)
  • Sometimes a small surplus of sessions trigger delay
  • It looks like the db Data IO also affect the Data IO
  • When there are delays the db DTU is high but not always
  • When delays the CPU% of stream analytics drops.

The following characteristics for the services used:

  • DB DTUs 200
  • 300 GB
  • 3 streaming units
  • DB MAXDOP = 0
  • DB number of CPUs 4

Grafana queries are very simple: read data for a 15 min range. The data sampling time is not frequent. Maximum every 5 seconds.

There are no data conversion or runtime errors in Stream Analytics.

There are dashboards using data from a different DB with 2 CPUs and with data frequency of 1 seconds. They work properly. Their Stream Analytics has no delay even the Grafana query time range may be 12 hours.

Also, the longest running queries analysis in insights of queries in SQL database panel on Azure portal were only for 5-15 minutes but taking too much time.

The Grafana dashboard has 3-6 panels while the working dashboard has >10 panels.

I tracked the following error among outputs of Stream Analytics (I definitely studied the link well and it didn’t help):

Encountered error trying to write 1 event(s): Resource ID : 1. The
request limit for the database is 400 and has been reached. See
‘https://docs.microsoft.com/azure/azure-sql/database/resource-limits-logical-server’
for assistance.

The strange thing is the same pipeline works with the same number of Grafana dashboards and number of messages from IoT hub with regular delay of 12 seconds (which also is expected to be less, but at least something). But sometimes I have very bad days with delay (20% of the time).

I saw people have similar problem in other forums but no decent solution. I also couldn’t solve using ChatGPT.

There were several attempts reducing the delay by KILL-ing all running DB queries. Most of them from Grafana (previously we also had Grafana always available for other users who queried a lot, now other users are disabled -> seems like Grafana queries became so long).

How can I solve this problem without increasing the number of DTUs (paying more)?
Should I increase the number of CPU?

2

Answers


  1. Chosen as BEST ANSWER

    Azure Stream Analytics reports watermark delay due to long-running queries requested by Grafana dashboards.

    Grafana dashboard is used to visualize data from Azure SQL database. Each panel on Grafana dashboard executes query reading data from SQL db.

    The speeding up queries by creating indices helped to make queries run faster. It helped to avoid watermark delays in Stream Analytics.


  2. The delay is caused by the number of sessions. When there are more than 8 sessions, the delay increases.

    Reduce the partition count for each streaming node to reduce the input data for each streaming node.

    enter image description here

    You can double the SUs allocated to each streaming node to two partitions per node by increasing streaming node count from 3 to 6. Or you can quadruple the SUs to have each streaming node handle data from one partition.

    You can also repartition your input with more partitions to reduce the amount of data in each partition.

    Check Analyze Stream Analytics job performance by using metrics and dimensions for details.

    Regarding the error message you encountered, it seems that the request limit for the database has been reached.

    Increasing the number of DTUs or CPUs to improve the performance, but it will also increase the cost.

    enter image description here

    enter image description here

    For more information refer to repartitioning-azure-stream-analytics-jobs and Resource management in Azure SQL.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search