Postgresql - How to search for the acronym of a string column in an indexable way

Daniel
August 30, 2024
202 views
0 votes
2 Answers

I’m working on an express.js project, using Sequelize as the orm, and postgres as the engine.

I’m working on improving the performance of a slow select query that does many things, and one of them is comparing the acronym of the name of the warehouse against a search term using ilike, so if the search term is "STW", a warehouse named "Super Test Warehouse" will match. The query looks like this:

SELECT field1, field2, ...
FROM package p
LEFT OUTER JOIN warehouse w ON p.warehouse_id = w.id
-- many other joins
WHERE
    regexp_replace(w.name, '([a-z ])+', '', 'g') ILIKE '%search term%'
    -- many other conditions

The thing is, the main improvement I’m doing is creating indexes (gin for everything that does ilike '%something%'), and refactoring the query to be able to use them. But the regexp_replace() function doesn’t work with indexes. Another important thing to note is that this solution only works if all the words in the names are correctly capitalized, which is the case with most of our data, but not all, so if there’s a warehouse named "Super TEST Warehouse", or "super test Warehouse", it won’t work correctly.

A simple solution I thought of, is to do this (assuming the search text is "STW"):

w.name ILIKE 'S% T% W%' OR w.name ILIKE '% S% T% W%'

It works on the average case, and it uses the gin index I created. But the problem is that it allows other words in between. So "STW" would also match "Super Fake Test Warehouse".

Is there any condition that would match the acronym correctly, and would use an index?

Would you recommend using a generated column for this (assuming I improve the existing regexp_replace, or use something else)? Are there any downsides of doing that? Ideally, I don’t want to alter a table only for this feature, but could be a good solution if there’s nothing else we can do.

Any other solutions?

Answers

Chosen as BEST ANSWER
- Daniel
- August 30, 2024 at 7:03 am
- 0 votes
0
What I'm doing in the end is making a generated column like this:
```
ALTER TABLE warehouse
ADD COLUMN name_acronym TEXT GENERATED ALWAYS AS (
    array_to_string(
        array(
            SELECT substring(word FROM 1 FOR 1)
            FROM unnest(string_to_array(name, ' ')) AS word
        ),
        ''
    )
) STORED;
```
I also have this index for name:
```
CREATE INDEX warehouse_name ON warehouse USING gin ("name" gin_trgm_ops);
```
First, this solution is more accurate for the acronyms, since it actually takes the first letter of each word. And also, ilike queries against the generated column will make use the index.

(Edit)

- ErwinBrandstetter
- August 30, 2024 at 1:30 am
- 0 votes
0
Ideally, I don’t want to alter a table only for this feature

So create an expression index:
```
CREATE INDEX foo ON package (upper(regexp_replace(name, 'W*?m(.).*?M', '1', 'g')));
```
The expression extracts upper-case acronyms reliably (except for trailing non-word characters in your strings):
```
SELECT upper(regexp_replace('SUPER ware hOUse', 'W*?m(.).*?M', '1', 'g'));  -- SWH
```
The involved functions are immutable, so fit for an expression index.

Repeat the expression in your queries to bring in the index:
```
SELECT *
FROM   package p
WHERE  upper(regexp_replace(name, 'W*?m(.).*?M', '1', 'g')) = 'SWH';
```
I would prefer to add a generated column, though. Simpler query and a bit faster, yet, for only very few added bytes per row. See:
- Computed / calculated / virtual / derived / generated columns in PostgreSQL
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – How to search for the acronym of a string column in an indexable way

Answers