I am running a simple join query, where the table incident_tracker_details is covered by a suitable index. The query is slow because it doesn`t use all the columns in Index Condition, but does use the column as a Filter leading to a large heap searches
Query:
SELECT
relevant_incidents_per_inventory_item.inventory_id,
tracker_id
FROM
relevant_incidents_per_inventory_item
JOIN
incident_tracker_details
ON
incident_tracker_details.incident_id = relevant_incidents_per_inventory_item.incident_id
WHERE
incident_tracker_details.company_id = :companyId
and incident_tracker_details.division_id = :divisionId
AND incident_tracker_details.value > 0
the index in use:
create index companyid_divisionId_incidentId_idx
on incident_tracker_details (company_id, division_id, incident_id)
where (value > 0) include (tracker_id);
the relevant_incidents_per_inventory_item
does not have any relevant indexes.
the actual query plan:
QUERY PLAN
Hash Join (cost=6901.58..6913.72 rows=1 width=8) (actual time=90.663..315.278 rows=522 loops=1)
Hash Cond: (incident_tracker_details.incident_id = relevant_incidents_per_inventory_item.incident_id)
-> Index Only Scan using companyid_divisionId_incidentId_idx on incident_tracker_details (cost=0.57..12.68 rows=6 width=16) (actual time=0.039..202.787 rows=246774 loops=1)
Index Cond: ((company_id = '777704491747470679'::bigint) AND (division_id = '777795770460846005'::bigint))
Heap Fetches: 124039
-> Hash (cost=6900.88..6900.88 rows=10 width=8) (actual time=86.704..86.709 rows=259 loops=1)
...not relevant part
Planning Time: 0.518 ms
Execution Time: 315.402 ms
Why is the "incident_id" not part of the Index Cond? this leads to retrieving all the rows under companyId, divisionId , and then the engine has to filter all the rows for those having incident_tracker_details.incident_id = relevant_incidents_per_inventory_item.incident_id
I expected to see all the columns used in the Index Condition, leading to fast retrieval of only the relevant data
2
Answers
I tried to find the answer to the incomplete index usage behaviour, and in order to simplify the case I removed the join condition and built a simple query targeting the
companyId, divisionId, incidentId
index. What I discovered is that when the query has a singleincident_id
value, the index is used fully and all columns are inIndex Cond
. On the other hand, when I have multiple (more than 1)incident_id
s either in OR or in IN form, the index is used partially what leads to the slow Heap Fetches as seen before.Simplified query:
Query plan for 2
incident_id
values using OR. Notice the Heap Fetches, and index onIndex Cond
onlycompany_id
anddivision_id
columns, theincident_id
is inFilter
.On the other hand, same query with a single
incident_id
value. The Heap Fetches is 0 and the execution time is almost 0:this leads me to believe that the behaviour we seen in the original question and in this simplified case is a result of planner wrongfully deciding to scan the
incident_id
level instead of traversing the tree for each incident_id value. I will glad to understand why it does so though...The reason
incident_id
isn’t part ofIndex Cond:
is because it isn’t part of the selection criteria associated withincident_tracker_details
. The only indexed columns mentioned in theWHERE
clause arecompany_id
anddivision_id
. Althoughincident_id
is used in the join criteria, it is not part of the selection criteria and thus has no part in the conditions used for the index only scan. The query engine has to get allincident_id
values associated with the selection criteria to identify the rows inrelevant_incidents_per_inventory_item
that satisfy the join condition.If
relevant_incidents_per_inventory_item
has an index withincident_id
as its first column, then the join criteria can be checked with an index scan. If there is no such index, then the join condition will be evaluated by using a full table scan and filtering for rows that haveincident_id
values matching those found by the index only scan ofcompanyid_divisionId_incidentId_idx
. Since you elided the portion of the explain plan associated withrelevant_incidents_per_inventory_item
and didn’t include all relevant DDL in the post, there isn’t any way for us to determine which of these was the case.