skip to Main Content

I have asp.net Core 3.1 API with ~500 requests/second.
And I’m doing pagination in a pure memory-based collection.
the collection is a large ~700K to 1 million items.
I’m only using LINQ "Where", "OrderBy", "Take" and "Skip" which is the simplest pagination approach.
let say that my class looks like this

public class Item
{
    public int Id { get; set; }
    public string Name { get; set; }
    public DateTime AgeUtc { get; set; }
}
private IEnumerable<ItemDto> PopulateItems(int? watermark, UiParameters uiParameters)
{
    IEnumerable<ItemDto> items;

    if (watermark.HasValue)
    {
        items = MemoryCacheService.Instance.ReadOnlyItems
            .Where(x => x.Id > watermark)
            .Skip((uiParameters.PageNumber - 1) * uiParameters.PageSize)
            .Take(uiParameters.PageSize);

        return items;
    }

    items = MemoryCacheService.Instance.ReadOnlyItems
        .Skip((uiParameters.PageNumber - 1) * uiParameters.PageSize)
        .Take(uiParameters.PageSize);

    return items;
}

My Problem is

  • CPU usage in really High
  • Caching my endpoint using Caching Response Middleware is usually causing out of memory exception because of the watermark value which causes high randomization in URLs.
  • Most HTTP requests are very slow (> ~8 seconds) and sometimes they are failing. (500 requests/second)

I’m sure there is a better way to make it more efficient. Any help πŸ™‚

Edit 1
β€”β€”β€”β€”β€”β€”

We did kind of partitioning (in memory) by splitting our collection into smaller lists, and the results are better. But I still believe we are on the wrong path, we tried to use Redis before using the in-memory collection and things become complicated when users grow from 1 to 1.6 million and a lot network exception occurs and things related to Redis stack exchange MultiPlexer. Now, with In memory solution (at least), we are not getting exceptions but it’s slow!
SQL Azure is really expensive! And when we did a load test for our API the SQL went down after 30 or 40 concurrent connections!

Edit 2
β€”β€”β€”β€”β€”β€”

We modified the implementation above to be like this, the performance becomes much better. but I still believe we make things complicated and this is not best way to do it.

public class MemoryCacheService {
    ...
    public ImmutableSortedDictionary<int, List<ItemDto>> Items { get; set; }
    public List<ItemDto> ReadOnlyItems { get; }
    ...
}
private IEnumerable<ItemDto> PopulateItems(int? watermark, UiParameters resourceParameters)
{
    IEnumerable<ItemDto> items;

    if (watermark.HasValue)
    {
        // Validate if Watermark smaller than 1st watermark in the list
        var firstKeyValye = MemoryCacheService.Instance.PartitionedReadOnlyItems.First().Key;
        if (watermark < firstKeyValye)
        {
            watermark = firstKeyValye - 1;
        }

        var skipper = (resourceParameters.PageNumber - 1) * resourceParameters.PageSize;
        var cursor = watermark.Value + skipper;
        var startKey = cursor - resourceParameters.PageSize;
        var endKey = cursor + resourceParameters.PageSize;

        // this logic will get two pages and merge values them in a single List
        var patientLocationsPages = MemoryCacheService.Instance.PartitionedReadOnlyItems
                .Where(k => k.Key >= startKey && k.Key <= endKey)
                .Select(s => s.Value)
                .SelectMany(p => p.AsEnumerable());

        items = patientLocationsPages
               .Where(p => p.Id > cursor)
               .Take(resourceParameters.PageSize);

        return items;
    }

    items = MemoryCacheService.Instance.ReadOnlyItems
        .Skip((resourceParameters.PageNumber - 1) * resourceParameters.PageSize)
        .Take(resourceParameters.PageSize);

    return items;
}

2

Answers


  1. This is not a Sql query, it’s in memory collection.

    Oh boy. This will never work, you’re doing linear searches in a database with up to a million elements, 500 times a second?

    You need to implement indexing of some sort on top of your collection, even something as simple as recording every 50 id’s index in the collection, so you don’t have to start from the beginning every time with .Where(x => x.Id > watermark). If it’s sorted on Id, even a binary search would be better than what you’re doing right now.

    Login or Signup to reply.
  2. You don’t explain how you are creating your ReadOnlyItems, but if you can use a SortedList<TKey,TValue> then you can greatly speed up your method.

    Add your Items to the sorted list using Item.Id as the key.

    In my tests with a million random Items with a Id spread of 4:1, I get about 0.02 – 0.04 ms with SortedList versus 1.2 – 4 ms for first lookup (or about 100 times faster) to about 0.001 ms vs 3.5 ms for subsequent lookups (or about 400 times faster).

    Then, you can do:

    private IEnumerable<ItemDto> PopulateItems(int? watermark, UiParameters uiParameters) {
        IEnumerable<ItemDto> items;
    
        if (watermark.HasValue) {
            var firstIndex = MemoryCacheService.Instance.ReadOnlyItems.IndexOfKey(watermark.Value) + 1;
    
            items = MemoryCacheService.Instance.ReadOnlyItems.Values
                    .Skip(firstIndex + (uiParameters.PageNumber - 1) * uiParameters.PageSize)
                    .Take(uiParameters.PageSize);
        }
        else
            items = MemoryCacheService.Instance.ReadOnlyItems.Values
                .Skip((uiParameters.PageNumber - 1) * uiParameters.PageSize)
                .Take(uiParameters.PageSize);
    
        return items;
    }
    

    NOTE: SortedList<>.Values implements IList and Skip and Take have special indexed based optimizations for this.

    You can use this extension method to covert to a SortedList from an IEnumerable:

    public static SortedList<TKey, TValue> ToSortedList<T, TKey, TValue>(this IEnumerable<T> items, Func<T, TKey> keyFn, Func<T, TValue> valueFn) {
        var ans = new SortedList<TKey, TValue>();
        foreach (var item in items)
            ans.Add(keyFn(item), valueFn(item));
        return ans;
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search