skip to Main Content

I’m currently developing a SOAP service with several operations. Requests and responses are saved in a table called "message". Each Request and each Response is its own entry in the table and each Request/Response pair has a corresponding correlation_id.

I created a SELECT query to display how many times each operation has been used in the last 15 minutes :

SELECT operation,
       count(distinct (correlation_id)) as last_15_minutes
FROM message
WHERE creation_timestamp > (SELECT NOW() - INTERVAL '15 MINUTES')
GROUP BY operation ;

This query works. I get the following correct result when I run it :

operation last_15_minutes      
--------- -------------------- 
5001      17                   
5005      15                   
5013      2                    
5021      7602                 
5201      4    

The problem is that the query is extremely slow. The above result can take more than 30 seconds when many requests have been sent in the last 15 minutes.

Has anyone any idea what could be improved ?

Thank you in advance.

EDIT :

Here’s the script that created this table :

CREATE TABLE message
(
  id bigint GENERATED BY DEFAULT AS IDENTITY (INCREMENT 1 START 10000000 MINVALUE 1 MAXVALUE 9223372036854775807) PRIMARY KEY,
  correlation_id character varying(50) not null,
  operation character varying(4),
  variant character varying(4),
  status bigint,
  message character varying,
  creation_timestamp timestamp without time zone not null,
  version bigint not null
);

Here are the indexes that I have on this table

CREATE INDEX message_creation_timestamp_idx ON message USING btree (creation_timestamp)
CREATE INDEX message_creation_timestamp_operation_variant_idx ON message USING btree (creation_timestamp, operation, variant)

Here’s the result of EXPLAIN (ANALYZE, VERBOSE, BUFFERS) :

EXPLAIN results

2

Answers


  1. This query is slow because there is no useful index and the statistics are way off: The database expects 5.3 million records but it finds just 7 rows.

    First create one of these indexes, or try them all:

    CREATE INDEX message_creation_timestamp_operation_idx ON message USING btree (creation_timestamp, operation);
    
    CREATE INDEX message_creation_timestamp_operation_correlation_id_idx ON message USING btree (creation_timestamp, operation, correlation_id);
    
    CREATE INDEX message_creation_timestamp_operation_correlation_id_2_idx ON message USING btree (creation_timestamp, operation) INCLUDE (correlation_id);
    

    Also run ANALYZE to get the statistics up to date, and check the autovacuum proces. Because that process also does the auto analyze.

    After creating an index, check the query plan again.

    SELECT operation,
           count(distinct correlation_id) as last_15_minutes
    FROM message
    WHERE creation_timestamp > (NOW() - INTERVAL '15 MINUTES')
    GROUP BY operation;
    
    Login or Signup to reply.
  2. The problem was not the index, but rather that the condition was recalculated every time. By deleting "SELECT" From Where Clause, the query speed can practically be reduced to several thousandths…

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search