skip to Main Content

I have huge words table which I’m running LIKE query on:

create table words
(
  id   int,
  word varchar
)

It works pretty long. Index doesn’t help a lot, so I’m trying to partition it by word column:

create table words
(
  id   int,
  word varchar
) partition by RANGE (word);

CREATE TABLE words_1 PARTITION OF words
  FOR VALUES FROM ('a') TO ('n');

CREATE TABLE words_2 PARTITION OF words
  FOR VALUES FROM ('n') TO ('z');

NOTE: Actually I’m planning to make 1 partition for each letter. Use only 2 of them for example simplicity.

So partitioning seems to work OK with equality and gt/lt operators:

explain
select * from words where word = 'abc'
Seq Scan on words_1 words  (cost=0.00..25.88 rows=6 width=36)
  Filter: ((word)::text = 'abc'::text)
explain
select * from words where word >= 'nth'
Seq Scan on words_2 words  (cost=0.00..25.88 rows=423 width=36)
  Filter: ((word)::text >= 'nth'::text)

But on LIKE queries it keeps scanning both partitions:

explain
select * from words where word LIKE 'abc%'
Append  (cost=0.00..51.81 rows=12 width=36)
  ->  Seq Scan on words_1  (cost=0.00..25.88 rows=6 width=36)
        Filter: ((word)::text ~~ 'abc'::text)
  ->  Seq Scan on words_2  (cost=0.00..25.88 rows=6 width=36)
        Filter: ((word)::text ~~ 'abc'::text)

Is there a way to make partitioning work on LIKE queries?

Maybe is there another way to achieve what i want?

2

Answers


  1. I cannot replicate your results. Your queries use an index, even with only a handful of rows. Make sure your tables have been analyzed and run your queries with explain analyze.


    An improved table design would drop the id and use the word as a primary key, assuming they’re unique. I’ve added about 1000 words and analyzed the table.

    create table words ( word text primary key );
    
    copy words(word) from '/Users/schwern/tmp/words.txt';
    
    analyze words;
    

    All of your queries do an index-only scan.

    The default B-Tree index can do exact matches (words = 'this'), trailing wildcards (words like 'this%') and ordering. We can improve this further adding a Gist index using trigram ops.

    create index word_gin_idx on words using gist(word gist_trgm_ops);
    

    Now queries such as word like '%this%' will use the Gist index.

    Login or Signup to reply.
  2. I’m a bit surprised it doesn’t just work, at least in the C collation. But I can verify that it doesn’t.

    You could rewrite the query manually the same way word like 'abc%' sometimes gets rewritten:

    explain analyze
    select * from words where word >='abc' and word <'abd'
    

    But this is only guaranteed to give the same answer in the C collation.

    By the way, you should check partition pruning with EXPLAIN ANALYZE. It is possible the partition pruning to only happen at run time, in which case all partitions still show up in the plan EXPLAIN plan. (But run-time pruning isn’t the case here, I checked)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search