I’m currently developing a SOAP service with several operations. Requests and responses are saved in a table called "message". Each Request and each Response is its own entry in the table and each Request/Response pair has a corresponding correlation_id.
I created a SELECT query to display how many times each operation has been used in the last 15 minutes :
SELECT operation,
count(distinct (correlation_id)) as last_15_minutes
FROM message
WHERE creation_timestamp > (SELECT NOW() - INTERVAL '15 MINUTES')
GROUP BY operation ;
This query works. I get the following correct result when I run it :
operation last_15_minutes
--------- --------------------
5001 17
5005 15
5013 2
5021 7602
5201 4
The problem is that the query is extremely slow. The above result can take more than 30 seconds when many requests have been sent in the last 15 minutes.
Has anyone any idea what could be improved ?
Thank you in advance.
EDIT :
Here’s the script that created this table :
CREATE TABLE message
(
id bigint GENERATED BY DEFAULT AS IDENTITY (INCREMENT 1 START 10000000 MINVALUE 1 MAXVALUE 9223372036854775807) PRIMARY KEY,
correlation_id character varying(50) not null,
operation character varying(4),
variant character varying(4),
status bigint,
message character varying,
creation_timestamp timestamp without time zone not null,
version bigint not null
);
Here are the indexes that I have on this table
CREATE INDEX message_creation_timestamp_idx ON message USING btree (creation_timestamp)
CREATE INDEX message_creation_timestamp_operation_variant_idx ON message USING btree (creation_timestamp, operation, variant)
Here’s the result of EXPLAIN (ANALYZE, VERBOSE, BUFFERS) :
2
Answers
This query is slow because there is no useful index and the statistics are way off: The database expects 5.3 million records but it finds just 7 rows.
First create one of these indexes, or try them all:
Also run ANALYZE to get the statistics up to date, and check the autovacuum proces. Because that process also does the auto analyze.
After creating an index, check the query plan again.
The problem was not the index, but rather that the condition was recalculated every time. By deleting "SELECT" From Where Clause, the query speed can practically be reduced to several thousandths…