I’m trying to use a compound index on a table to assist in creating daily report counts. My table looks like this:
CREATE TABLE inquiries (
id bigint NOT NULL,
identity_id bigint NOT NULL,
received_at timestamp(0) without time zone NOT NULL,
purpose_id bigint NOT NULL,
location_id bigint NOT NULL
);
CREATE INDEX "inquiries_DATE_index" ON inquiries USING btree (date(received_at), location_id, purpose_id, identity_id);
My query looks like this:
SELECT DATE(received_at), location_id, purpose_id, COUNT(DISTINCT identity_id)
FROM inquiries
WHERE (DATE(received_at) >= $1)
AND (DATE(received_at) <= $2)
GROUP BY 1, 2, 3
Explain output looks like this:
GroupAggregate (cost=43703.28..45785.49 rows=10950 width=19)
Group Key: (date(received_at)), location_id, purpose_id
-> Sort (cost=43703.28..44092.34 rows=155627 width=16)
Sort Key: (date(received_at)), location_id, purpose_id
-> Bitmap Heap Scan on inquiries (cost=5243.60..27622.21 rows=155627 width=16)
Recheck Cond: ((date(received_at) >= '2023-11-01'::date) AND (date(received_at) <= '2023-11-30'::date))
-> Bitmap Index Scan on "inquiries_DATE_index" (cost=0.00..5204.70 rows=155627 width=0)
Index Cond: ((date(received_at) >= '2023-11-01'::date) AND (date(received_at) <= '2023-11-30'::date))
The index doesn’t really seem to be helping as the query takes a long time to execute. If I add a date column to the table and use that instead of date(received_at) then the query works better and the explain syntax looks like the following:
GroupAggregate (cost=0.43..85199.58 rows=10980 width=19)
Group Key: pacific_date, location_id, purpose_id
-> Index Only Scan using inquiries_pacific_date_index on inquiries (cost=0.43..77813.12 rows=727666 width=16)
Index Cond: ((pacific_date >= '2023-11-01'::date) AND (pacific_date <= '2023-11-30'::date))
I guess I can do this if I can’t find a better way but it seems kind of redundant. Is there a way I can write my original query so it makes use of the index? Using postgres 13
2
Answers
The problem is that due to a limitation in PostgreSQL, it won’t use an index-only scan with your index, because the index contains an expression. You’d have to add
received_at
to the index (in addition todate(received_at)
) to make it work.Drop-in fix
Like Laurenz explained, index-only scans currently (pg 16) suffer from a corner-case limitation in Postgres. The manual:
The manual has more details. One workaround is to "include" the column itself in the index:
Should give you index-only scans for your original query. But it also makes increases the size of the index (by 8 bytes per row in your case).
If the columns (location_id, purpose_id, identity_id) serve no additional purposes in the index, for the query at hand,those can move to the INCLUDE section as well:
Should help some more with a big table such as yours ("9B rows").
Better
Create an index on bare columns, without expressions:
And adjust your query slightly, to be exactly equivalent:
Input
$1
and$2
must be typedate
, as indicated in the question.