skip to Main Content

I have a bunch of news articles (~100k at the moment). Every article can be connected with more than one category. So I have two keys article URL and category. I need to store this articles in Redis and need to group them by category. I need to get all articles of any category in fast and one article by it’s URL. Also I need to have for all articles expiration time.

I decided to use hash sets but then understand that hash set items have not expiration date.

In short, I’m not sure where best to go from here. I’m still pretty new in this area. I wonder if there are some best practices for that.

2

Answers


  1. You may use sorted sets to store the list of articles for a specific category. You use score as expire time and id as the value. I don’t recommend you to store whole article(whole text) as the value because your memory usage may increase dramatically since multiple categories will have the same article and you will need store same article on different and multiple sorted sets(category a,b,c may all have article 1,2,3).

    While setting an article;

    • SET article(id as key) with EX option. (the text will be here)
    • SET article url as key and text as value with EX option. (i am skipping this part, it is already clear on your side)
    • For each category the article has, run ZREMRANGEBYSCORE(from -inf to current timestamp) to remove already expired articles.
    • push the id of the article to the category sorted sets with ZADD( expire time will be score and the value is article id)
    • get the article id’s by pagination with ZRANGE.
    • Alternatively, you may use ZREMRANGEBYSCORE before using ZRANGE.
    • It will give you the article id, then you can use GET to get text of the articles.

    For simplicity i used small expire times.

    127.0.0.1:6379> SET article:1 very-long-article-text EX 120
    OK
    127.0.0.1:6379> ZREMRANGEBYSCORE category:1 -inf 20
    (integer) 0
    127.0.0.1:6379> ZADD category:1 3 article:1
    (integer) 1
    127.0.0.1:6379> SET article:2 article-details EX 120
    OK
    127.0.0.1:6379> ZREMRANGEBYSCORE category:1 -inf 20
    (integer) 1
    127.0.0.1:6379> ZADD category:1 3 article:2
    (integer) 1
    127.0.0.1:6379> ZRANGE category:1 0 9
    1) "article:2"
    127.0.0.1:6379> ZADD category:2 3 article:2
    (integer) 1
    

    If you don’t want to use article id in the sorted set you can set the text instead of id and remove usage of GET after you get all the ids.

    Login or Signup to reply.
  2. @Ersoy answer is a good solution for your case and I think there are some enhance:

    1. I recommend not directly using URL as redis key, since URL can be very long and include some special characters(like ‘/’). Using MD5 or BASE64 to encode will be an enhance.

    2. When doing multiple Redis command for updating, considering concurrent situation if you execute those command one by one. Or you can use transaction or lock to make it atomic.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search