I have two tables T1 and T2. I’m doing a simple inner join
select t1.a, t1.b, t1.c
from T1 t1
inner join T2 t2 on t1.c = t2.c
Table T1 has got 2 million rows and T2 has got 4 million records
explain analyse of the query / execution time takes approximately 3 seconds.
I have indexes for the column c on T1 as well as on T2
What can I do to improve this or if there is any alternate way of writing this query? I do not need to select columns from T2.
appreciate any help.
2
Answers
Depending on the requirements, table partitioning can be a solution.
https://www.postgresql.org/docs/current/ddl-partitioning.html
That heavily depends on the characteristics of the data and traffic on the tables. Before you share your table and index definitions along with a
explain (analyze,buffers,verbose)
output, guessing purely based on your mention of basic, single-column index aiding the operation: you could look into tuning some additional settings of those indexes. I’m assuming you’re already getting an index-only scan ont2
and a regular index scan ont1
.The 2.5s above is close to your current score. If you
include
the other two columns you’re fetching as payload on the index, you can get it to run index-only scans for both tables (finds everything in the indexes, doesn’t have to jump to actual tables) – in my case that gives a >4x speedup down to 0.6s:If your data is completely static or only refreshed/reloaded in full, you can also compact the indexes: by default they leave 10% empty space to accommodate incoming rows, which those tables don’t need.
If heap fetches are unavoidable or your payload is too big, you can consider
cluster
: