skip to Main Content

In Postgres 16.2 I’ve a table Product with extra index:

CREATE TABLE Product (id   SERIAL , name: text, companyId: number, PRIMARY KEY (id));
CREATE INDEX product_company_id ON Product (companyId);

The table has about 12M rows.

When I search for a row with companyId that exists in the table/index the right "product_company_id" index is used and execution is about 100ms.

select id, name, companyId from Product where companyId=12 order by id limit 1;

But when I search for a row with non existent companyId the execution is very slow about 7s. According to Explain/Analyze the index on companyId is not used:

Limit  (cost=0.09..208.24 rows=1 width=43) (actual time=6404.660..6404.661 rows=0 loops=1)
  ->  Index Scan Backward using "Product_pkey" on "Product"  (cost=0.09..562625.56 rows=2703 width=43) (actual time=6404.659..6404.659 rows=0 loops=1)
        Filter: (companyid = 667)
        Rows Removed by Filter: 11797182
Planning Time: 0.100 ms
Execution Time: 6404.674 ms

Removing the "limit" fixes the problem and index is used again. But the limit is important.

Is there any way to force the DB to use the index to find out there are no matching rows?

2

Answers


  1. Yes, you can have a two-dimensional index of the form of:

    CREATE INDEX product_company_id ON Product ((companyId / 1000000),companyId);
    

    that is, the first dimension of the index is generating clusters of slots for 1000000-partitions and if therefore you filter by companyId being 12, it is having a 0 in the first index and looks for the million of companies in that range rather than the entirety of the table. You can of course change the 1000000 to some other number.

    Fiddle: https://www.db-fiddle.com/f/ueAq14fNuYbq7Wy9TJXxFL/0

    Running

    explain select * from Product where CompanyId = 12
    

    yields:

    QUERY PLAN
    Bitmap Heap Scan on product (cost=13.15..22.62 rows=6 width=40)
    Recheck Cond: (companyid = 12)
    -> Bitmap Index Scan on product_company_id (cost=0.00..13.15 rows=6 width=0)
    Index Cond: (companyid = 12)
    
    Login or Signup to reply.
  2. PostgreSQL cannot know with certainty that no rows will match that given companyId. In fact it thinks that 2703 will. You could try to improve that estimate by making sure ANALYZE has been run on the table, or by increasing the stats target for that column and then running ANALYZE.

    Or, you could just create a more robust index, for example one on (companyId, id). This index will still look attractive and so probably be used even if the row estimate is wrong.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search