Postgresql - How to index a column for leading wildcard search and check progress?

Ryan
November 12, 2022
235 views
1 vote
2 Answers

My table has 650M rows (according to a fast but decently precise estimate from a query I found here).

It has a text column called receiver_account_id, and I need to be able to search those records like: r.receiver_account_id LIKE '%otherWordsHere'.

Because I’m using a leading wildcard, those searches are impossibly slow. I need an index. And my guess from here is that I need a GIN index.

I ran:

CREATE EXTENSION pg_trgm;
CREATE EXTENSION btree_gin;
CREATE INDEX CONCURRENTLY receipts_receiver_account_id_gin_idx ON public.receipts USING gin (receiver_account_id);

But I’m not sure that the index is even being created.

I ran:

SELECT
  now()::TIME(0),
  a.query,
  p.phase,
  round(p.blocks_done / p.blocks_total::numeric * 100, 2) AS "% done",
  p.blocks_total,
  p.blocks_done,
  p.tuples_total,
  p.tuples_done,
  ai.schemaname,
  ai.relname,
  ai.indexrelname
FROM pg_stat_progress_create_index p
JOIN pg_stat_activity a ON p.pid = a.pid
LEFT JOIN pg_stat_all_indexes ai on ai.relid = p.relid AND ai.indexrelid = p.index_relid;

But I just see <insufficient privilege> (which is bizarre since I own this machine) and a bunch of NULLs.

The next 2 status queries I got from here.

SELECT * FROM pg_class, pg_index WHERE pg_index.indisvalid = false AND pg_index.indexrelid = pg_class.oid; shows:

And then:

SELECT a.datname,
         l.relation::regclass,
         l.transactionid,
         l.mode,
         l.GRANTED,
         a.usename,
         a.query,
         a.query_start,
         age(now(), a.query_start) AS "age",
         a.pid
FROM pg_stat_activity a
JOIN pg_locks l ON l.pid = a.pid
WHERE mode = 'ShareUpdateExclusiveLock'
ORDER BY a.query_start;

Shows:

Am I doing this correctly? How can I know when the index creation will finish?

Answers

- jjanes
- November 12, 2022 at 3:01 am
- 0 votes
0
A pg_trgm index can be used here, but it is unnecessarily slow and large. Better would be a functional index.
```
CREATE INDEX CONCURRENTLY receipts_receiver_account_id_rev_idx ON public.receipts (reverse(receiver_account_id) text_pattern_ops);

SELECT....WHERE reverse(r.receiver_account_id) LIKE reverse('%otherWordsHere')
```
If the pattern contained literal % or _ which are escaped by backslashes, then processing the pattern would need to more complex than just calling reverse() on it.
Login or Signup to reply.

- ErwinBrandstetter
- November 12, 2022 at 10:20 am
- 0 votes
0
Tailored expression index

If queries with a leading wildcard are the only (or the only important) kind of queries on that column, then consider an expression index, like @jjanes suggested. It’s typically (much) smaller and cheaper to maintain than a trigram index, and faster for fitting queries. (It’s far less versatile, though!)

In modern versions of Postgres I would lean towards a COLLATE "C" index instead of text_pattern_ops, though. See:
- Is there a difference between text_pattern_ops and COLLATE "C"?
There is no indication in your question but, typically, you want to search case-insensitive. So I add lower() to the expression to arrive at:
```
CREATE INDEX CONCURRENTLY receipts_receiver_account_id_rev_idx
ON public.receipts (lower(reverse(receiver_account_id)) COLLATE "C");
```
CONCURRENTLY only if you need to avoid an exclusive lock on the table. Else, it’s faster without.

Match the expression in queries:
```
... WHERE lower(reverse(receiver_account_id))
     LIKE lower(reverse('otherWordsHere'   )) || '%' COLLATE "C";
```
Note how I concatenate the wildcard to the right explicitly. That allows Postgres to use the index even for generic query plans with parameterized ‘otherWordsHere’.

Or, faster yet, use the "starts with" operator ^@ in Postgres 15 or later:
```
... WHERE lower(reverse(receiver_account_id))
       ^@ lower(reverse('otherWordsHere'   )) COLLATE "C";
```
No wildcard. And no hurdles for generic query plans, either. See:
- PostgreSQL LIKE query performance variations
The minor downside of using COLLATE "C" instead of text_pattern_ops is that you have to spell out COLLATE "C" in queries to match the index. But while you have to match the "reverse" expression exactly anyway …

Related:
- Search with additional words at the beginning
If we know more about the % part and the otherWordsHere part in your pattern ‘%otherWordsHere’, like length or constant bits, we might be able to optimize further.

Your failed attempt at a trigram index

You don’t need the additional module btree_gin to create a trigram index on a string-type column. Just pg_trgm.
But you forgot to declare the needed operator class:
```
CREATE EXTENSION pg_trgm;
CREATE INDEX CONCURRENTLY receipts_receiver_account_id_gin_idx ON public.receipts
USING gin (receiver_account_id gin_trgm_ops);
```
You may still want that index to cover a variety of patterns …

Escaping special characters in LIKE patterns

See:
- Escape function for regular expression or LIKE patterns
Tracking progress

How can I know when the index creation will finish?

Since Postgres 12, you can consult pg_stat_progress_create_index about progress. See:
- Feedback on whether index was created on materialized views in postgresql
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – How to index a column for leading wildcard search and check progress?

Answers

Tailored expression index

Your failed attempt at a trigram index

Escaping special characters in `LIKE` patterns

Tracking progress

Postgresql – How to index a column for leading wildcard search and check progress?

Answers

Tailored expression index

Your failed attempt at a trigram index

Escaping special characters in LIKE patterns

Tracking progress

Escaping special characters in `LIKE` patterns