skip to Main Content

I am running a simple join query, where the table incident_tracker_details is covered by a suitable index. The query is slow because it doesn`t use all the columns in Index Condition, but does use the column as a Filter leading to a large heap searches

Query:

SELECT
    relevant_incidents_per_inventory_item.inventory_id,
    tracker_id
FROM
    relevant_incidents_per_inventory_item
JOIN
    incident_tracker_details
ON
    incident_tracker_details.incident_id = relevant_incidents_per_inventory_item.incident_id
WHERE
    incident_tracker_details.company_id = :companyId
    and incident_tracker_details.division_id = :divisionId
    AND incident_tracker_details.value > 0

the index in use:

create index companyid_divisionId_incidentId_idx
on incident_tracker_details (company_id, division_id, incident_id)
where (value > 0) include (tracker_id); 

the relevant_incidents_per_inventory_item does not have any relevant indexes.

the actual query plan:

QUERY PLAN
    Hash Join  (cost=6901.58..6913.72 rows=1 width=8) (actual time=90.663..315.278 rows=522 loops=1)
      Hash Cond: (incident_tracker_details.incident_id = relevant_incidents_per_inventory_item.incident_id)
      ->  Index Only Scan using companyid_divisionId_incidentId_idx on incident_tracker_details  (cost=0.57..12.68 rows=6 width=16) (actual time=0.039..202.787 rows=246774 loops=1)
            Index Cond: ((company_id = '777704491747470679'::bigint) AND (division_id = '777795770460846005'::bigint))
            Heap Fetches: 124039
      ->  Hash  (cost=6900.88..6900.88 rows=10 width=8) (actual time=86.704..86.709 rows=259 loops=1)
            ...not relevant part
    Planning Time: 0.518 ms
    Execution Time: 315.402 ms

Why is the "incident_id" not part of the Index Cond? this leads to retrieving all the rows under companyId, divisionId , and then the engine has to filter all the rows for those having incident_tracker_details.incident_id = relevant_incidents_per_inventory_item.incident_id

I expected to see all the columns used in the Index Condition, leading to fast retrieval of only the relevant data

2

Answers


  1. Chosen as BEST ANSWER

    I tried to find the answer to the incomplete index usage behaviour, and in order to simplify the case I removed the join condition and built a simple query targeting the companyId, divisionId, incidentId index. What I discovered is that when the query has a single incident_id value, the index is used fully and all columns are in Index Cond. On the other hand, when I have multiple (more than 1) incident_ids either in OR or in IN form, the index is used partially what leads to the slow Heap Fetches as seen before.

    Simplified query:

    SELECT
        tracker_id
    FROM
        bi.incident_tracker_details
    WHERE
        company_id = :companyId
        AND division_id = :divisionId
            (incident_id = 234234234234234234 OR incident_id = 333333333333333333)   
        AND value > 0;
    

    Query plan for 2 incident_id values using OR. Notice the Heap Fetches, and index on Index Cond only company_id and division_id columns, the incident_id is in Filter.

        QUERY PLAN
        Index Only Scan using companyid_divisionId_incident_id_idx on incident_tracker_details  (cost=0.57..8.72 rows=1 width=8) (actual time=103.485..103.485 rows=0 loops=1)
          Index Cond: ((company_id = '4478704491747470679'::bigint) AND (division_id = '4514095770460846005'::bigint))
          Filter: ((incident_id = '234234234234234234'::bigint) OR (incident_id = '333333333333333333'::bigint))
          Rows Removed by Filter: 249705
          Heap Fetches: 20397
        Planning Time: 0.155 ms
        Execution Time: 103.512 ms
    

    On the other hand, same query with a single incident_id value. The Heap Fetches is 0 and the execution time is almost 0:

        QUERY PLAN
        Index Only Scan using companyid_divisionId_incident_id_idx on incident_tracker_details  (cost=0.57..8.59 rows=1 width=8) (actual time=0.019..0.019 rows=0 loops=1)
          Index Cond: ((company_id = '4478704491747470679'::bigint) 
                        AND (division_id = '4514095770460846005'::bigint) 
                        AND (incident_id = '234234234234234234'::bigint))
          Heap Fetches: 0
        Planning Time: 0.125 ms
        Execution Time: 0.042 ms
    

    this leads me to believe that the behaviour we seen in the original question and in this simplified case is a result of planner wrongfully deciding to scan the incident_id level instead of traversing the tree for each incident_id value. I will glad to understand why it does so though...


  2. The reason incident_id isn’t part of Index Cond: is because it isn’t part of the selection criteria associated with incident_tracker_details. The only indexed columns mentioned in the WHERE clause are company_id and division_id. Although incident_id is used in the join criteria, it is not part of the selection criteria and thus has no part in the conditions used for the index only scan. The query engine has to get all incident_id values associated with the selection criteria to identify the rows in relevant_incidents_per_inventory_item that satisfy the join condition.

    If relevant_incidents_per_inventory_item has an index with incident_id as its first column, then the join criteria can be checked with an index scan. If there is no such index, then the join condition will be evaluated by using a full table scan and filtering for rows that have incident_id values matching those found by the index only scan of companyid_divisionId_incidentId_idx. Since you elided the portion of the explain plan associated with relevant_incidents_per_inventory_item and didn’t include all relevant DDL in the post, there isn’t any way for us to determine which of these was the case.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search