This is the table schema:
CREATE TABLE IF NOT EXISTS events_table
(
id UUID NOT NULL,
company_id INT8 NOT NULL,
employee_id INT8 NOT NULL,
category_type_id UUID,
event_type VARCHAR(255) NOT NULL,
data_payload bytea NOT NULL,
occurred_at TIMESTAMP WITH TIME ZONE NOT NULL,
stored_at TIMESTAMP WITH TIME ZONE NOT NULL,
PRIMARY KEY (id)
);
We have the following index:
CREATE INDEX events_idx ON events_table (company_id, employee_id, category_type_id, occurred_at, stored_at);
And this is the query:
SELECT
table_name.id,
table_name.company_id,
table_name.employee_id,
table_name.category_type_id,
table_name.type,
table_name.payload,
table_name.occurred_at,
table_name.stored_at
FROM
schema_name.table_name
WHERE
(table_name.company_id = ?
AND table_name.employee_id IN (?)
AND table_name.category_type_id = cast ( ? )
AND table_name.occurred_at <= CAST(? AS TIMESTAMP WITH TIME ZONE))
ORDER BY
table_name.occurred_at ASC,
table_name.stored_at ASC;
Why is the DB sorting in the first place:
Gather Merge (cost=116700.30..129758.08 rows=111916 width=122) (actual time=357.017..512.239 rows=295813 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=115700.28..115840.17 rows=55958 width=122) (actual time=345.202..364.049 rows=98604 loops=3)
Sort Key: occurred_at, stored_at
Sort Method: external merge Disk: 13760kB
Worker 0: Sort Method: external merge Disk: 6280kB
Worker 1: Sort Method: external merge Disk: 6120kB
-> Parallel Bitmap Heap Scan on balance_state_event (cost=6311.95..107650.92 rows=55958 width=122) (actual time=51.027..280.405 rows=98604 loops=3)
Recheck Cond: ((company_id = 100) AND (category_type_id = '0baee077-c735-4f95-bd26-0c6b4b1eddd4'::uuid) AND (employee_id = ANY ('{17,1,19,2,18,11,15,16,20,861,8,12,4,3,5,13,7,6,14,9,10,1021,1022,1023,1024,1025,1026
,1027,1028,1029,1030,1031,1032,1033,1034,1035,1036,1037,1038,1039,1040,1041,1042,1043,1044,1045,1046,1047,1048,1049,1050,1051,1052,1053,1054,1055,1056,1057,1058,1059,1060,1061,1062,1063,1064,1065,1066,1067,1068,1069,1070,1071,10
72,1073,1074,1075,1076,1077,1078,1079,1080,1081,1082,1083,1084,1085,1086,1087,1088,1089,1090,1091,1092,1093,1094,1095,1096,1097,1098,1099,1100}'::bigint[])) AND (occurred_at <= '2024-05-03 00:00:00+00'::timestamp with time zone)
)
Rows Removed by Index Recheck: 64790
Heap Blocks: exact=21346 lossy=17655
-> Bitmap Index Scan on events_idx (cost=0.00..6278.13 rows=134298 width=0) (actual time=49.032..49.032 rows=295813 loops=1)
Index Cond: ((company_id = 100) AND (absence_type_id = '0baee077-c735-4f95-bd26-0c6b4b1eddd4'::uuid) AND (employee_id = ANY ('{17,1,19,2,18,11,15,16,20,861,8,12,4,3,5,13,7,6,14,9,10,1021,1022,1023,1024,1025,
1026,1027,1028,1029,1030,1031,1032,1033,1034,1035,1036,1037,1038,1039,1040,1041,1042,1043,1044,1045,1046,1047,1048,1049,1050,1051,1052,1053,1054,1055,1056,1057,1058,1059,1060,1061,1062,1063,1064,1065,1066,1067,1068,1069,1070,107
1,1072,1073,1074,1075,1076,1077,1078,1079,1080,1081,1082,1083,1084,1085,1086,1087,1088,1089,1090,1091,1092,1093,1094,1095,1096,1097,1098,1099,1100}'::bigint[])) AND (occurred_at <= '2024-05-03 00:00:00+00'::timestamp with time z
one))
Planning Time: 0.445 ms
Execution Time: 528.740 ms
(16 rows)
Parallel Bitmap Heap Scan on balance_state_event first?
should not sort, since the index is maintaining the sort order.
2
Answers
The reason an explicit sort is needed is because the index scan does not necessarily return the rows in sorted order. The cause is the
IN
condition.If you have a three-column index on
(a, b, c)
, the following query does not need an explicit sort, because the index scan returns the rows in sorted order:But this query will need an explicit sort:
The rows would be returned by the index only piecewise ordered, that is, ordered separately for each chunk derived from the same employee_id. This is not the same thing as a total ordering your query specified.
PostgreSQL could get each sorted piece and then interleave them together with something like a "Merge Append", but no one has bothered to implement this. It isn’t clear how this would even be implemented as doing the merge within the index scan node would be weird, while getting the boundaries up into some separate merge node would be tricky.
It is hard to believe this matters, as the sort takes up a minority of the total time and that total time is pretty fast to start with given the number of rows being returned. Getting rid of the sort might even make it slower, as then you would not get the IO benefit that you might be getting from sequential-read-like property of the bitmap and the parallelization might also be more difficult.
You could get a literal Merge Append mode by writing the query as a UNION ALL of a bunch of subqueries, once for each value in the IN-list. You would need to specify the ORDER BY on each subquery as well as on the entire UNION ALL in order to get the desired behavior. This would be rather awkward to write, and while the performance improvement can be spectacular in certain situations it is hard to see how your query is one of those situations.