I have used Redis’ Sorted sets to build fast leaderboard solutions for games.
There are two main query operations I needed for my task:
- Get elements and their scores from index A to B in descending order (ZREVRANGE in Redis)
- Get an index and a score of a particular element in descending order (ZREVRANK in Redis)
As I know Redis uses special data structure for Sorted Set, which combines some kind of skip list with hash table.
Now I want to migrate from Redis to modern IMDG solutions and choosing between Hazelcast and Apache Ignite. What are the closest analogs of Redis Sorted Sets and ZREVRANGE, ZREVRANK operations in Hazelcast/Apache Ignite?
3
Answers
In Hazelcast you can create
IndexType.SORTED
. Please check the related documentation, Hazelcast Reference Manual: Indexing Queries.It seems that it’s possible to leverage query entities in
Apache Ignite
.Here’s a raw example.
After that it will be possible to use regular
SQL
queries to obtain data from a cache. Table name is going to be"USERINFO"
as it’s inherited from the type name:UserInfo
, anyway it’s possible to reconfigure that. A schema comes from a cache name.For instance:
It will make a range index scan.
You can run an
EXPLAIN
command to check what index is being used (in the example we have 2 of them).It seems like a really complex task to implement a leaderboard in a distributed system like IMDG. Correct me if I’m wrong but ZSET is a more a local thing in
Redis
, you can’t have a Sorted Set bigger than your biggest shard.I’d like to split the issue into two separate cases: local and distributed.
Ignite
it could be achieved by making a cache REPLICATED. It would be possible to use localSQL
queries withLIMIT/OFFSET
but as far as I know there’s no fast (O(log(n)
) index jump to start further scanning from some row with a particular number. It makes it impossible to achieve overall complexity ofO(log(n) + m)
wherem
is a window size. Theoretically it’s possible to implement that. I’ve initiated the discussion on the "Apache Ignite Developers" list. Anyway it all depends on your real use-case: how many users you have, what’s the target latency, percentiles etc.a
tob
because data distribution (affinity maps a key to a node) can be not absolutely even. It means that you need to scan the entire index on every node until you reachb
(or closest value above).