We are experiencing issues with high load on our dotnet-core (3.1) application.
Beyond a certain amount of connection (virtual users), we encouter a bottleneck, the server is starved and we get request timeout but the process doesn’t crash (no kestrel logs). We are using K6 to benchmark our app. For now the load test only performs GET requests on the login page which trigger one basic SQL request on a small dataset (no join, etc).
We used Visual Studio 2019 Perfomance Profiler tool and perfview to investigate the issue, but none of these tools helped us to identify the portion of code that caused this bottleneck.
I found this article about ThreadPool starvation : https://learn.microsoft.com/fr-fr/archive/blogs/vancem/diagnosing-net-core-threadpool-starvation-with-perfview-why-my-service-is-not-saturating-all-cores-or-seems-to-stall
When we tweak the minimum ThreadPool with arbitrary values as the example after, we’ve got a huge improvement in performance (not on the graph). This seems like a stop gap, how bad is it to use it ?
System.Threading.ThreadPool.SetMinThreads(200, 200);
Explanation : 2C_2G/100.csv => 2 cores, 2Go RAM, 100 virtual users
Environment:
- nginx as reverse proxy
- K6 as benchmark tool
- dotnet-core 3.1 (with EntityFramework)
- operating system : Ubuntu 20.04
- mariadb as database
2
Answers
You’re executing long-running code while on the thread pool.
Here’s a way to do that with
Task.Run
:To the casual observer that looks like completely async code because there’s
async/await and
Task
everywhere.But in fact that will tie up a thread pool thread for as long as it takes to
read the stream (which depends not just on how much data comes through, but the
bandwidth of the stream as well).
When the thread pool is starved then there’s a one-second delay before the
thread pool will spawn a new thread. That means that subsequent calls to
Task.Run
will have their work delayed for that longeven if your CPU is sitting idle.
Alternatives:
Stream.ReadAsync
), especially when you’re on the thread poolThe
TaskCreationOptions.LongRunning
flag tells C# that you want a new threadspawned immediately just for your work.
Yes, increasing the minimum worker thread count is not a solution, but a gap-stopper.
It seems that you are able to reproduce the issue. In that case, I suggest using
dotnet-dump
to figure out where the blocking code is. Follow the steps in this YouTube Video on diagnosing thread pool starvation, it is pretty effective.BTW, for the gap-stopper code, I would read and keep the 2nd argument for the async IO pool count if that’s not causing any trouble, as well as checking the setup result of the call: