skip to Main Content

I have read in a couple of places that some databases use bloom filters for finding a match when querying a database. In my example I’m using Postgresql which is one of those databases.

My question comes up when talking about implementing a bloom filter with Redis which has a module that allows you to use a bloom filter when entering a member in a set. (Keep in mind the complexity of the lookup process though, not retrieving that value from disk)

Now the benefits of using Redis is to store values in memory, which when then trying to retrieve that value is more performant vs looking it up in a rdbms because that value is stored on a disk.

In my example, say I’m checking if a username already exists, is it still worth using Redis in memory solution with a bloom filter vs just checking with a Postgresql query?

My flow is something like:
CheckIfUserExsits() // using Redis bloom filter
If TRUE then confirm with rdbms // do to x% probability of false positive nature of bloom filter
If rdbms == MATCH then reply with "User does exist"
Else don't check rdbms at all // do to 0% probability of false negative nature of bloom filter

This flow is supposed to be more preformant because you’re not querying a rdbms and doing this quickly via memory lookup returning false more efficiently.

However since all I care about is whether a member exists or not, to increase performance on replying with false, does the Redis step really help? Because if Postgresql is already querying a table using a bloom filter then the performance should already be relatively fast.

2

Answers


  1. Don’t add Redis unless you actually have performance issues. It adds a second datastore and development will be significantly slowed down and see more bugs because every data change needs to be synchronised between the two.

    Redis also doesn’t help much when it just replaces a function that’s at the core of what Postgres also does (well). Postgres also uses memory to cache data. Just make sure it’s configured with appropriate memory limits.

    Where Redis can be useful is in caching “derived” data, i. e. anything the database and your application have spent significant time on.

    Bloom filters is similar: do use them when you need hundreds of thousands of lookups every second. But default to using your existing database. Postgres is capable of rather decent performance –– I’ve been surprised by it quite a few times.

    Only optimise after measuring and identifying a problem.

    Login or Signup to reply.
  2. I have read in a couple of places that some databases use bloom filters for finding a match when querying a database. In my example I’m using Postgresql which is one of those databases.

    Could you provide links to the things you were reading?

    PostgreSQL does have a bloom filter index type extension (in “contrib”), but the index has to be created explicitly and it would not be useful for your use case anyway. It answers the question, for each individual row, “does this row satisfy a set of conditions”. It does not answer the question “does any row in this table satisfy this one single condition”.

    PostgreSQL also has a C-language bloom data structure for internal use, but again your requirement is not one of those things it is used for.

    Bloom filters of the type you want would be hard to implement in the face of PostgreSQL’s ACID/MVCC model and its storage model.

    If you really need this (and I doubt you do) then using redis seems like a good tool for the job. But how will you keep them in sync?

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search