In code of my colleagues I’ve noticed code like this:
private Task FetchAllKeysFromRedis(List<string> keys, ConcurrentBag<RedisModel> resultsBag, CancellationToken cancellationToken)
{
var parallelism = Environment.ProcessorCount;
var semafore = new SemaphoreSlim(initialCount: parallelism, maxCount: parallelism);
var tasks = new List<Task>();
foreach (var key in keys)
{
cancellationToken.ThrowIfCancellationRequested();
tasks.Add(FetchFromRedis(key, semafore, resultsBag, cancellationToken));
}
return Task.WhenAll(tasks);
List<string> keys
approximately contains around 1000 keys, and FetchFromRedis
method asynchronously execute I/O operaion (fetch from Redis), so to sum up it executes around 1000 of I/O operations.
Critical section look like this:
private async Task FetchFromRedis(string key, SemaphoreSlim semafore, ConcurrentBag<RedisResult> resultsBag, CancellationToken cancellationToken)
{
await semafore.WaitAsync(cancellationToken);
try
{
var redisResult = await _getRedisResultFromRedis.ExecuteAsync(key, cancellationToken);
if (redisResult != null)
resultsBag.Add(redisResult);
finally
{
semafore.Release();
}
}
My question: is Environment.ProcessorCount
sensible maxCount
for SemaphoreSlim
? As it will limit count of threads that can step into critical section to very low, like 8, and time of execution will be much longer?
If it not sensible, then what is sensible value for maxCount
of threads?
My opinion: since it involves IO, most of the time will just be waiting, no processor time needed. So limiting concurrency to the number of cores make little sense.
2
Answers
I would agree with your reasoning, since the work looks to be network requests that is limited by latency or bandwith, restricting the concurrency to the number of CPU cores make little sense.
You probably need to do some testing to find a suitable maxCount value. But ideally you should have a system that is self-adapting to the actual circumstances. I’m honestly not sure what the best design would be, but I would take a look at dataflow to see if that can provide a better way to limit concurrency. Or at least expose the value as configuration so it can be adjusted at some later time.
Configuring the degree of parallelism with the value
Environment.ProcessorCount
is sensible in only one case: In case you have absolutely no clue how to configure this setting. In such a case any value will be equally random and arbitrary, so why not choose theEnvironment.ProcessorCount
, which reflects somehow the capabilities of the current machine?A similar dilemma was surfaced while designing the
Parallel.ForEachAsync
API, that debuted on .NET 6. After evaluating the available options, the Microsoft API designers chose theEnvironment.ProcessorCount
as the value for theMaxDegreeOfParallelism
, when this configuration is not explicitly provided. It’s ironic that the synchronousParallel.ForEach
API, for which theEnvironment.ProcessorCount
would be an even more sensible default, it has actually -1 as the default, which means "unlimited" parallelism, or practically limited by theThreadPool
availability.