I have a bunch of news articles (~100k at the moment). Every article can be connected with more than one category. So I have two keys article URL
and category
. I need to store this articles in Redis and need to group them by category. I need to get all articles of any category in fast and one article by it’s URL. Also I need to have for all articles expiration time.
I decided to use hash sets but then understand that hash set items have not expiration date.
In short, I’m not sure where best to go from here. I’m still pretty new in this area. I wonder if there are some best practices for that.
2
Answers
You may use sorted sets to store the list of articles for a specific category. You use
score
as expire time andid
as the value. I don’t recommend you to store whole article(whole text) as the value because your memory usage may increase dramatically since multiple categories will have the same article and you will need store same article on different and multiple sorted sets(category a,b,c may all have article 1,2,3).While setting an article;
SET
article(id as key) withEX
option. (the text will be here)SET
article url as key and text as value withEX
option. (i am skipping this part, it is already clear on your side)ZREMRANGEBYSCORE
(from -inf to current timestamp) to remove already expired articles.ZADD
( expire time will be score and the value is article id)ZRANGE
.ZRANGE
.GET
to get text of the articles.For simplicity i used small expire times.
If you don’t want to use article id in the sorted set you can set the text instead of
id
and remove usage ofGET
after you get all the ids.@Ersoy answer is a good solution for your case and I think there are some enhance:
I recommend not directly using URL as redis key, since URL can be very long and include some special characters(like ‘/’). Using MD5 or BASE64 to encode will be an enhance.
When doing multiple Redis command for updating, considering concurrent situation if you execute those command one by one. Or you can use transaction or lock to make it atomic.