My table has 650M rows (according to a fast but decently precise estimate from a query I found here).
It has a text column called receiver_account_id
, and I need to be able to search those records like: r.receiver_account_id LIKE '%otherWordsHere'
.
Because I’m using a leading wildcard, those searches are impossibly slow. I need an index. And my guess from here is that I need a GIN index.
I ran:
CREATE EXTENSION pg_trgm;
CREATE EXTENSION btree_gin;
CREATE INDEX CONCURRENTLY receipts_receiver_account_id_gin_idx ON public.receipts USING gin (receiver_account_id);
But I’m not sure that the index is even being created.
I ran:
SELECT
now()::TIME(0),
a.query,
p.phase,
round(p.blocks_done / p.blocks_total::numeric * 100, 2) AS "% done",
p.blocks_total,
p.blocks_done,
p.tuples_total,
p.tuples_done,
ai.schemaname,
ai.relname,
ai.indexrelname
FROM pg_stat_progress_create_index p
JOIN pg_stat_activity a ON p.pid = a.pid
LEFT JOIN pg_stat_all_indexes ai on ai.relid = p.relid AND ai.indexrelid = p.index_relid;
But I just see <insufficient privilege>
(which is bizarre since I own this machine) and a bunch of NULL
s.
The next 2 status queries I got from here.
SELECT * FROM pg_class, pg_index WHERE pg_index.indisvalid = false AND pg_index.indexrelid = pg_class.oid;
shows:
And then:
SELECT a.datname,
l.relation::regclass,
l.transactionid,
l.mode,
l.GRANTED,
a.usename,
a.query,
a.query_start,
age(now(), a.query_start) AS "age",
a.pid
FROM pg_stat_activity a
JOIN pg_locks l ON l.pid = a.pid
WHERE mode = 'ShareUpdateExclusiveLock'
ORDER BY a.query_start;
Am I doing this correctly? How can I know when the index creation will finish?
2
Answers
A pg_trgm index can be used here, but it is unnecessarily slow and large. Better would be a functional index.
If the pattern contained literal % or _ which are escaped by backslashes, then processing the pattern would need to more complex than just calling reverse() on it.
Tailored expression index
If queries with a leading wildcard are the only (or the only important) kind of queries on that column, then consider an expression index, like @jjanes suggested. It’s typically (much) smaller and cheaper to maintain than a trigram index, and faster for fitting queries. (It’s far less versatile, though!)
In modern versions of Postgres I would lean towards a
COLLATE "C"
index instead oftext_pattern_ops
, though. See:There is no indication in your question but, typically, you want to search case-insensitive. So I add
lower()
to the expression to arrive at:CONCURRENTLY
only if you need to avoid an exclusive lock on the table. Else, it’s faster without.Match the expression in queries:
Note how I concatenate the wildcard to the right explicitly. That allows Postgres to use the index even for generic query plans with parameterized ‘otherWordsHere’.
Or, faster yet, use the "starts with" operator
^@
in Postgres 15 or later:No wildcard. And no hurdles for generic query plans, either. See:
The minor downside of using
COLLATE "C"
instead oftext_pattern_ops
is that you have to spell outCOLLATE "C"
in queries to match the index. But while you have to match the "reverse" expression exactly anyway …Related:
If we know more about the
%
part and theotherWordsHere
part in your pattern ‘%otherWordsHere’, like length or constant bits, we might be able to optimize further.Your failed attempt at a trigram index
You don’t need the additional module
btree_gin
to create a trigram index on a string-type column. Justpg_trgm
.But you forgot to declare the needed operator class:
You may still want that index to cover a variety of patterns …
Escaping special characters in
LIKE
patternsSee:
Tracking progress
Since Postgres 12, you can consult
pg_stat_progress_create_index
about progress. See: