skip to Main Content

We have an Express API that connects to a Postgres database using Sequelize.

One of our endpoints returns a list of sales, which accepts different query parameters like page, column order, filters, and a search term. And that search is just ilikes against the search term on 8 columns, plus 2 other more complex ilikes: one that compares the term against a concatenation of the first and last name of the client (using the concat function); and other that compares it against an acronym of a location name (which uses regexp_replace on that column).

We’re testing with the client that has the most data (100k rows aprox), and the endpoint usually takes between 1 and 2 seconds, which is acceptable, but if the search is performed, it takes like 6 seconds.

There are some simple stuff we can do, like reduce the amount of columns we search on (we’re evaluating if all of them are necessary), stop doing the acronym thing, and do something else instead of that concat for the full name, like the || operator (which I’m trying to do on Sequelize). Also we can use indexes.

But other than that, what would you recommend to do? I’m evaluating using Elasticsearch, but it may be overkill and needs time to set up and maintain.

I also read about Full Search Text on Postgres, which would make searching much faster, and would give us more advanced search features (similar to Elasticsearch), would you recommend it? I think you would pair this with gist or gin indexes, right?

Any other recommendation?

2

Answers


  1. try using index on that columns for text search

    CREATE INDEX idx_fulltext_content ON documents USING GIN(to_tsvector('english', content));
    

    i didnt used before but i have experince in elasticsearch

    elasticsearch has big stack and its for large data sets like daily 100k documents.

    u just need simple text search dont need to bring elastic(search engine features) for that simple query
    try features u have before using new technologies

    Login or Signup to reply.
  2. For this problem, I would probably start by evaluating which of the operations is taking the most time. It may be the case that the concatenation and ILIKE clauses are taking up time, or it may be the case that your storage is loading the data into RAM slow. It’s very difficult to tell from just a description of the query.

    To that end, I highly recommend using Postgres’ query plan analysis tools. (https://www.postgresql.org/docs/current/sql-explain.html)

    EXPLAIN ANALYZE will give you a query plan (the internal execution steps that postgres is taking), which you can then copy-paste into a query plan visualizer like dalibo (https://explain.dalibo.com/).

    Once you’ve done that, you can see plainly which exact part of your query is taking the most time.

    That said, if you haven’t indexed the columns that you are searching on, that is step 0. Everything after indexing is an afterthought. Correct indexing will give you 90%+ time saved on many queries. Everything else is on the order of 5 or 10%.

    Note that a regular b-tree postgres index on a string column will only actually be used by the postgres query planner if certain conditions are met. Most importantly, it will only be used when searching for substrings that are at the start of the overall string.

    To get around this, you will need to use a postgres full text search index. The choice of GIN or GIST indexes is very very minor when compared to the time save of simply having an index at all.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search