Postgresql - Compound index with date function doesn't seem to work

BradPardee
November 22, 2023
255 views
1 vote
2 Answers

I’m trying to use a compound index on a table to assist in creating daily report counts. My table looks like this:

CREATE TABLE inquiries (
    id bigint NOT NULL,
    identity_id bigint NOT NULL,
    received_at timestamp(0) without time zone NOT NULL,
    purpose_id bigint NOT NULL,
    location_id bigint NOT NULL
);
CREATE INDEX "inquiries_DATE_index" ON inquiries USING btree (date(received_at), location_id, purpose_id, identity_id);

My query looks like this:

SELECT DATE(received_at), location_id, purpose_id, COUNT(DISTINCT identity_id)
FROM inquiries
WHERE (DATE(received_at) >= $1)
  AND (DATE(received_at) <= $2)
GROUP BY 1, 2, 3

Explain output looks like this:

GroupAggregate  (cost=43703.28..45785.49 rows=10950 width=19)
  Group Key: (date(received_at)), location_id, purpose_id
  ->  Sort  (cost=43703.28..44092.34 rows=155627 width=16)
        Sort Key: (date(received_at)), location_id, purpose_id
        ->  Bitmap Heap Scan on inquiries  (cost=5243.60..27622.21 rows=155627 width=16)
              Recheck Cond: ((date(received_at) >= '2023-11-01'::date) AND (date(received_at) <= '2023-11-30'::date))
              ->  Bitmap Index Scan on "inquiries_DATE_index"  (cost=0.00..5204.70 rows=155627 width=0)
                    Index Cond: ((date(received_at) >= '2023-11-01'::date) AND (date(received_at) <= '2023-11-30'::date))

The index doesn’t really seem to be helping as the query takes a long time to execute. If I add a date column to the table and use that instead of date(received_at) then the query works better and the explain syntax looks like the following:

GroupAggregate  (cost=0.43..85199.58 rows=10980 width=19)
  Group Key: pacific_date, location_id, purpose_id
  ->  Index Only Scan using inquiries_pacific_date_index on inquiries  (cost=0.43..77813.12 rows=727666 width=16)
        Index Cond: ((pacific_date >= '2023-11-01'::date) AND (pacific_date <= '2023-11-30'::date))

I guess I can do this if I can’t find a better way but it seems kind of redundant. Is there a way I can write my original query so it makes use of the index? Using postgres 13

Tags: postgresql query-optimization

Answers

- LaurenzAlbe
- November 22, 2023 at 3:59 pm
- 0 votes
0
The problem is that due to a limitation in PostgreSQL, it won’t use an index-only scan with your index, because the index contains an expression. You’d have to add received_at to the index (in addition to date(received_at)) to make it work.

Login or Signup to reply.

- ErwinBrandstetter
- November 22, 2023 at 4:42 pm
- 0 votes
0
Drop-in fix

Like Laurenz explained, index-only scans currently (pg 16) suffer from a corner-case limitation in Postgres. The manual:

However, PostgreSQL’s planner is currently not very smart about such
cases. It considers a query to be potentially executable by index-only
scan only when all columns needed by the query are available from
the index.

The manual has more details. One workaround is to "include" the column itself in the index:
```
CREATE INDEX inquiries_date_idx ON inquiries (date(received_at), location_id, purpose_id, identity_id) INCLUDE (received_at);
```
Should give you index-only scans for your original query. But it also makes increases the size of the index (by 8 bytes per row in your case).

If the columns (location_id, purpose_id, identity_id) serve no additional purposes in the index, for the query at hand,those can move to the INCLUDE section as well:
```
CREATE INDEX inquiries_date_idx ON inquiries (date(received_at)) INCLUDE (location_id, purpose_id, identity_id, received_at);
```
Should help some more with a big table such as yours ("9B rows").

Better

Create an index on bare columns, without expressions:
```
CREATE INDEX inquiries_received_at_incl_idx ON inquiries (received_at) INCLUDE (location_id, purpose_id, identity_id);
```
And adjust your query slightly, to be exactly equivalent:
```
SELECT received_at::date, location_id, purpose_id, COUNT(DISTINCT identity_id)
FROM   inquiries
WHERE  received_at >= $1
AND    received_at <  $2 + 1  -- !
GROUP  BY 1, 2, 3;
```
Input $1 and $2 must be type date, as indicated in the question.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Compound index with date function doesn't seem to work

Answers

Drop-in fix

Better