skip to Main Content

I am querying a table with around 4M data with two types of skuID and using order by on version as there are around avg 5k versions of data for one skuID and fetching the top most versions using limit.

Query:

 select * FROM table rb 
    WHERE rb.sku_id='' or rb.package_sku_id=''
    order by version desc
limit 1;

By using explain on query we found out that sorting is taking most of the cost for the query:

Limit  (cost=0.43..5304.64 rows=1 width=861) (actual time=50327.036..50327.041 rows=1 loops=1)
  Buffers: shared hit=361280 read=104302 written=18
  I/O Timings: read=40363.693 write=0.215
  ->  Index Scan Backward using "IDX488yr43nr28a1yml9lb5i7jfv" on referral_benefits rb  (cost=0.43..9552890.48 rows=1801 width=861) (actual time=50327.028..50327.028 rows=1 loops=1)
        Filter: (((sku_id)::text = 'b1d5fa77-fda8-466c-b8b0-a8e60d1f78a5'::text) OR ((package_sku_id)::text = 'b1d5fa77-fda8-466c-b8b0-a8e60d1f78a5'::text))
        Rows Removed by Filter: 1361027
        Buffers: shared hit=361280 read=104302 written=18
        I/O Timings: read=40363.693 write=0.215
Planning Time: 1.121 ms
Execution Time: 50329.843 ms

We have added indexes on skuId and package_sku_id combined but that is not reducing the time.

Need guidance on this.
Thanks in advance

2

Answers


  1. If you have two indexes, one on (sku_id, version) and one on (package_sku_id, version), then you could get very fast execution by doing:

    (select * FROM rb WHERE rb.sku_id='b1d5fa77-fda8-466c-b8b0-a8e60d1f78a5' order by version desc) 
        union all
    (select * FROM rb WHERE rb.package_sku_id='b1d5fa77-fda8-466c-b8b0-a8e60d1f78a5' order by version desc)
    order by version desc limit 1;
    

    But note that if you increase the LIMIT beyond 1, then this query might return duplicate rows if both SKU conditions are satisfied for the same row.

    Login or Signup to reply.
  2. PostgreSQL features partial indexes, i.e. indexes that only apply to a subset of the table. Use that index type and have it only contain the rows you want to look at. Then the DBMS only has to read that index and the data comes out sorted:

    create index idx on mytable (version desc)
                     where sku_id = '' or package_sku_id = '';
    

    Another option would be a separate table only containing the desired top row. Write a trigger to keep that up-to-date.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search