skip to Main Content

My table has 650M rows (according to a fast but decently precise estimate from a query I found here).

It has a text column called receiver_account_id, and I need to be able to search those records like: r.receiver_account_id LIKE '%otherWordsHere'.

Because I’m using a leading wildcard, those searches are impossibly slow. I need an index. And my guess from here is that I need a GIN index.

I ran:

CREATE EXTENSION pg_trgm;
CREATE EXTENSION btree_gin;
CREATE INDEX CONCURRENTLY receipts_receiver_account_id_gin_idx ON public.receipts USING gin (receiver_account_id);

But I’m not sure that the index is even being created.

I ran:

SELECT
  now()::TIME(0),
  a.query,
  p.phase,
  round(p.blocks_done / p.blocks_total::numeric * 100, 2) AS "% done",
  p.blocks_total,
  p.blocks_done,
  p.tuples_total,
  p.tuples_done,
  ai.schemaname,
  ai.relname,
  ai.indexrelname
FROM pg_stat_progress_create_index p
JOIN pg_stat_activity a ON p.pid = a.pid
LEFT JOIN pg_stat_all_indexes ai on ai.relid = p.relid AND ai.indexrelid = p.index_relid;

But I just see <insufficient privilege> (which is bizarre since I own this machine) and a bunch of NULLs.

The next 2 status queries I got from here.

SELECT * FROM pg_class, pg_index WHERE pg_index.indisvalid = false AND pg_index.indexrelid = pg_class.oid; shows:

screenshot1

And then:

SELECT a.datname,
         l.relation::regclass,
         l.transactionid,
         l.mode,
         l.GRANTED,
         a.usename,
         a.query,
         a.query_start,
         age(now(), a.query_start) AS "age",
         a.pid
FROM pg_stat_activity a
JOIN pg_locks l ON l.pid = a.pid
WHERE mode = 'ShareUpdateExclusiveLock'
ORDER BY a.query_start;

Shows:
screenshot2

Am I doing this correctly? How can I know when the index creation will finish?

2

Answers


  1. A pg_trgm index can be used here, but it is unnecessarily slow and large. Better would be a functional index.

    CREATE INDEX CONCURRENTLY receipts_receiver_account_id_rev_idx ON public.receipts (reverse(receiver_account_id) text_pattern_ops);
    
    SELECT....WHERE reverse(r.receiver_account_id) LIKE reverse('%otherWordsHere')
    

    If the pattern contained literal % or _ which are escaped by backslashes, then processing the pattern would need to more complex than just calling reverse() on it.

    Login or Signup to reply.
  2. Tailored expression index

    If queries with a leading wildcard are the only (or the only important) kind of queries on that column, then consider an expression index, like @jjanes suggested. It’s typically (much) smaller and cheaper to maintain than a trigram index, and faster for fitting queries. (It’s far less versatile, though!)

    In modern versions of Postgres I would lean towards a COLLATE "C" index instead of text_pattern_ops, though. See:

    There is no indication in your question but, typically, you want to search case-insensitive. So I add lower() to the expression to arrive at:

    CREATE INDEX CONCURRENTLY receipts_receiver_account_id_rev_idx
    ON public.receipts (lower(reverse(receiver_account_id)) COLLATE "C");
    

    CONCURRENTLY only if you need to avoid an exclusive lock on the table. Else, it’s faster without.

    Match the expression in queries:

    ... WHERE lower(reverse(receiver_account_id))
         LIKE lower(reverse('otherWordsHere'   )) || '%' COLLATE "C";
    

    Note how I concatenate the wildcard to the right explicitly. That allows Postgres to use the index even for generic query plans with parameterized ‘otherWordsHere’.

    Or, faster yet, use the "starts with" operator ^@ in Postgres 15 or later:

    ... WHERE lower(reverse(receiver_account_id))
           ^@ lower(reverse('otherWordsHere'   )) COLLATE "C";
    

    No wildcard. And no hurdles for generic query plans, either. See:

    The minor downside of using COLLATE "C" instead of text_pattern_ops is that you have to spell out COLLATE "C" in queries to match the index. But while you have to match the "reverse" expression exactly anyway …

    Related:

    If we know more about the % part and the otherWordsHere part in your pattern ‘%otherWordsHere’, like length or constant bits, we might be able to optimize further.

    Your failed attempt at a trigram index

    You don’t need the additional module btree_gin to create a trigram index on a string-type column. Just pg_trgm.
    But you forgot to declare the needed operator class:

    CREATE EXTENSION pg_trgm;
    CREATE INDEX CONCURRENTLY receipts_receiver_account_id_gin_idx ON public.receipts
    USING gin (receiver_account_id gin_trgm_ops);

    You may still want that index to cover a variety of patterns …

    Escaping special characters in LIKE patterns

    See:

    Tracking progress

    How can I know when the index creation will finish?

    Since Postgres 12, you can consult pg_stat_progress_create_index about progress. See:

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search