I have asp.net Core 3.1 API with ~500 requests/second.
And I’m doing pagination in a pure memory-based collection.
the collection is a large ~700K to 1 million items.
I’m only using LINQ "Where", "OrderBy", "Take" and "Skip" which is the simplest pagination approach.
let say that my class looks like this
public class Item
{
public int Id { get; set; }
public string Name { get; set; }
public DateTime AgeUtc { get; set; }
}
private IEnumerable<ItemDto> PopulateItems(int? watermark, UiParameters uiParameters)
{
IEnumerable<ItemDto> items;
if (watermark.HasValue)
{
items = MemoryCacheService.Instance.ReadOnlyItems
.Where(x => x.Id > watermark)
.Skip((uiParameters.PageNumber - 1) * uiParameters.PageSize)
.Take(uiParameters.PageSize);
return items;
}
items = MemoryCacheService.Instance.ReadOnlyItems
.Skip((uiParameters.PageNumber - 1) * uiParameters.PageSize)
.Take(uiParameters.PageSize);
return items;
}
My Problem is
- CPU usage in really High
- Caching my endpoint using Caching Response Middleware is usually causing out of memory exception because of the
watermark
value which causes high randomization in URLs. - Most HTTP requests are very slow (> ~8 seconds) and sometimes they are failing. (500 requests/second)
I’m sure there is a better way to make it more efficient. Any help π
Edit 1
ββββββ
We did kind of partitioning (in memory) by splitting our collection into smaller lists, and the results are better. But I still believe we are on the wrong path, we tried to use Redis before using the in-memory collection and things become complicated when users grow from 1 to 1.6 million and a lot network exception occurs and things related to Redis stack exchange MultiPlexer. Now, with In memory solution (at least), we are not getting exceptions but itβs slow!
SQL Azure is really expensive! And when we did a load test for our API the SQL went down after 30 or 40 concurrent connections!
Edit 2
ββββββ
We modified the implementation above to be like this, the performance becomes much better. but I still believe we make things complicated and this is not best way to do it.
public class MemoryCacheService {
...
public ImmutableSortedDictionary<int, List<ItemDto>> Items { get; set; }
public List<ItemDto> ReadOnlyItems { get; }
...
}
private IEnumerable<ItemDto> PopulateItems(int? watermark, UiParameters resourceParameters)
{
IEnumerable<ItemDto> items;
if (watermark.HasValue)
{
// Validate if Watermark smaller than 1st watermark in the list
var firstKeyValye = MemoryCacheService.Instance.PartitionedReadOnlyItems.First().Key;
if (watermark < firstKeyValye)
{
watermark = firstKeyValye - 1;
}
var skipper = (resourceParameters.PageNumber - 1) * resourceParameters.PageSize;
var cursor = watermark.Value + skipper;
var startKey = cursor - resourceParameters.PageSize;
var endKey = cursor + resourceParameters.PageSize;
// this logic will get two pages and merge values them in a single List
var patientLocationsPages = MemoryCacheService.Instance.PartitionedReadOnlyItems
.Where(k => k.Key >= startKey && k.Key <= endKey)
.Select(s => s.Value)
.SelectMany(p => p.AsEnumerable());
items = patientLocationsPages
.Where(p => p.Id > cursor)
.Take(resourceParameters.PageSize);
return items;
}
items = MemoryCacheService.Instance.ReadOnlyItems
.Skip((resourceParameters.PageNumber - 1) * resourceParameters.PageSize)
.Take(resourceParameters.PageSize);
return items;
}
2
Answers
Oh boy. This will never work, you’re doing linear searches in a database with up to a million elements, 500 times a second?
You need to implement indexing of some sort on top of your collection, even something as simple as recording every 50 id’s index in the collection, so you don’t have to start from the beginning every time with
.Where(x => x.Id > watermark)
. If it’s sorted onId
, even a binary search would be better than what you’re doing right now.You don’t explain how you are creating your
ReadOnlyItems
, but if you can use aSortedList<TKey,TValue>
then you can greatly speed up your method.Add your
Item
s to the sorted list usingItem.Id
as the key.In my tests with a million random
Item
s with aId
spread of 4:1, I get about 0.02 – 0.04 ms withSortedList
versus 1.2 – 4 ms for first lookup (or about 100 times faster) to about 0.001 ms vs 3.5 ms for subsequent lookups (or about 400 times faster).Then, you can do:
NOTE:
SortedList<>.Values
implementsIList
andSkip
andTake
have special indexed based optimizations for this.You can use this extension method to covert to a
SortedList
from anIEnumerable
: