skip to Main Content

We’ve recently created a new Standard 1 GB Azure Redis cache specifically for distributed locking – separated from our main Redis cache. This was done to improve stability on our main Redis cache which is a very long term issue which this action seems to of significantly helped with.

On our new cache, we observe bursts of ~100 errors within the same few seconds every 1 – 3 days. The errors are either:

No connection is available to service this operation (StackExchange.Redis error)

Or:

Could not acquire distributed lock: Conflicted (RedLock.net error)

As they are errors from different packages, I suspect the Redis cache itself is the problem here. None of the stats during this time look out of the ordinary and the workload should fit comfortably in the Standard 1GB size.

I’m guessing this could be caused by the advertised Low network performance advertised, is this likely the cause?

2

Answers


  1. Your theory sounds plausible.

    Checking for insufficient network bandwidth

    Here is a handy table showing the maximum observed bandwidth for various pricing tiers. Take a look at the observed maximum bandwidth for your SKU, then head over to your Redis blade in the Azure Portal and choose Metrics. Set the aggregation to Max, and look at the sum of cache read and cache write. This is your total bandwidth consumed. Overlay the sum of these two against the time period when you’re experiencing the errors, and see if the problem is network throughput. If that’s the case, scale up.

    Checking server load

    Also on the Metrics tab, take a look at server load. This is the percentage that Redis is busy and is unable to process requests. If you hit 100%, Redis cannot respond to new requests and you will experience timeout issues. If that’s the case, scale up.

    Reusing ConnectionMultiplexer

    You can also run out of connections to a Redis server if you’re spinning up a new instance of StackExchange.Redis.ConnectionMultiplexer per request. The service limits for the number of connections available based on your SKU are here on the pricing page. You can see if you’re exceeding the maximum allowed connections for your SKU on the Metrics tab, select max aggregation, and choose Connected Clients as your metric.

    Thread Exhaustion

    This doesn’t sound like your error, but I’ll include it for completeness in this Rogue’s Gallery of Redis issues, and it comes into play with Azure Web Apps. By default, the thread pool will start with 4 threads that can be immediately allocated to work. When you need more than four threads, they’re doled out at a rate of one thread per 500ms. So if you dump a ton of requests on a Web App in a short period of time, you can end up queuing work and eventually having requests dropped before they even get to Redis. To test to see if this is a problem, go to Metrics for your Web App and choose Threads and set the aggregation to max. If you see a huge spike in a short period of time that corresponds with your trouble, you’ve found a culprit. Resolutions include making proper use of async/await. And when that gets you no further, use ThreadPool.SetMinThreads to a higher value, preferably one that is close to or above the max thread usage that you see in your bursts.

    Login or Signup to reply.
  2. Rob has some great suggestions but did want to add information on troubleshooting traffic burst and poor ThreadPool settings. Please see: Troubleshoot Azure Cache for Redis client-side issues

    Bursts of traffic combined with poor ThreadPool settings can result in delays in processing data already sent by the Redis Server but not yet consumed on the client side.

    Monitor how your ThreadPool statistics change over time using an example ThreadPoolLogger. You can use TimeoutException messages from StackExchange.Redis like below to further investigate:

    System.TimeoutException: Timeout performing EVAL, inst: 8, mgr: Inactive, queue: 0, qu: 0, qs: 0, qc: 0, wr: 0, wq: 0, in: 64221, ar: 0,
    IOCP: (Busy=6,Free=999,Min=2,Max=1000), WORKER: (Busy=7,Free=8184,Min=2,Max=8191)
    
    • Notice that in the IOCP section and the WORKER section you have a Busy value that is greater than the Min value. This difference means your ThreadPool settings need adjusting.
    • You can also see in: 64221. This value indicates that 64,211 bytes have been received at the client’s kernel socket layer but haven’t been read by the application. This difference typically means that your application (for example, StackExchange.Redis) isn’t reading data from the network as quickly as the server is sending it to you.

    You can configure your ThreadPool Settings to make sure that your thread pool scales up quickly under burst scenarios.

    I hope you find this additional information is helpful.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search