skip to Main Content

My system is using PostgreSQL as the database. I am encountering an issue when using a query to join two large tables together. I’ll briefly describe the tables as follows:

Location:
 - id: uuid
 - name: string (indexes)
 - country: string (indexes)
 - number: string

Product:
 - id: uuid
 - name: string
 - score: number (indexes)
 - rate: number (indexes)
 - report: number (indexes)
 - lock: boolean (indexes)
 - location_id: uuid (not null) (indexes)


Location and Product will be 1-1 relation unique. Location can be has product or not.

I used the following query:

select l.id, l.name, l.number, p.id as pId, p.name, p.score, p.rate
from location l
left join product p on p.location_id = l.id
where l.country = 'US'
and l.id < 'xxx-yyy-zzz'
and (p.name is not null or l.name is not null)
and p.score > 1 and p.rate > 4 and p.lock = false
order by id desc
limit 100

Note that both the product and location tables have a very large number of records. Each table contains around several million to tens of millions of records. How can I speed up the query execution? For some countries, it runs very fast, but for others, it runs slowly, even though there are fewer records.

2

Answers


  1. You can try creating indicies on:

    1. Create an index on location(country, id) to help with the filtering by l.country and l.id.
    2. Create an index on product(location_id, score, rate, lock) to assist with the join and filtering on p.
    CREATE INDEX idx_location_country_id ON location(country, id);
    CREATE INDEX idx_product_location_score_rate_lock ON product(location_id, score, rate, lock);
    
    

    Note:

    Filter on p.lock: In some cases, p.lock = false can be an expensive condition if there is no index on this column. You may need to ensure that p.lock is indexed.

    Avoid Excessive NULL Checks: Since p.name IS NOT NULL and l.name IS NOT NULL are conditions that could be expensive, ensure that columns that are frequently queried for NULL checks are indexed.

    Limit Clause Placement: The LIMIT 100 is correctly placed to limit the results after the sorting operation.

    Further you can also try table partitioning based on you usage

    Login or Signup to reply.
  2. Move the right table’s conditions into the join clause:

    select
      l.id, l.name, l.number, p.id as pId, p.name, p.score, p.rate
    from location l
    left join product p on p.location_id = l.id
      and (p.name is not null or l.name is not null)
      and p.score > 1 and p.rate > 4 and p.lock = false
    where l.country = 'US'
      and l.id < 'xxx-yyy-zzz'
    order by id desc
    limit 100
    

    and create a covering index (has all columns needed, in optimal order) on product:

    create index product_1 on product(location_id, lock, score, rate, name);
    

    and this index on location is assist both where and order by clauses:

    create index idx_location_country_id on location(country, id desc);
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search