skip to Main Content

I have below price audit table in timescale db. When I get new price I should calculate minimum price in last 30 days. But our price data not daily. In every price changes I write date and price to table. Problem is when I try to find the minimum price, I have to go before the last 30 days because there is not enough data. Also there are 130 Million rows in the table.

Example Range

In this case minimum price should be 120. Because this product sold for 120 ₺ between 12 July to 4 August.

DDL Example:

create table price_audit (
    timestamp timestamp with time zone not null,
    partnerid varchar(200)             not null,
    productid varchar(200)             not null,
    price     numeric(12, 2)
);

alter table price_audit
    owner to appuser;

create index _price_audit_timestamp_idx
    on price_audit (timestamp desc);

create index productid_time_desc
    on price_audit (productid asc, timestamp desc)
    where (productid IS NOT NULL);

create index partnerid_time_desc
    on price_audit (partnerid asc, timestamp desc)
    where (partnerid IS NOT NULL);

create index partnerid_productid_time_desc
    on price_audit (partnerid asc, productid asc, timestamp desc)
    where ((partnerid IS NOT NULL) AND (productid IS NOT NULL));

Data Example:

Example Data

Insert Query :

insert into price_audit ("timestamp", partnerid, productid, price)
values ('2022-07-09 18:18:39.000000 +00:00','b319406e-2ca7-4663-a203-68a1928fdc53','ABCTEST1', 120.00),
       ('2022-08-05 21:19:40.000000 +00:00','b319406e-2ca7-4663-a203-68a1928fdc53','ABCTEST1',130.00),
       ('2022-08-10 11:20:39.000000 +00:00','b319406e-2ca7-4663-a203-68a1928fdc53','ABCTEST1',140.00);

When new price came, we should not include in range:
New Price Date: 2022/08/11
30 days subtracted date: 2022/07/12

The following queries take the first price older than 30 days, along with prices from the last 30 days, and return the minimum.

SELECT min(price) from price_audit
         where (timestamp >= (select timestamp from price_audit
         where timestamp < '2022-07-12T15:30:00-00:00' and productid = 'ABCTEST1' and partnerid = 'b319406e-2ca7-4663-a203-68a1928fdc53'
         order by timestamp desc limit 1) or
         timestamp >= '2022-07-12T15:30:00-00:00' and timestamp < '2022-08-11T15:30:00-00:00') and 
         productid = 'ABCTEST1' and partnerid = 'b319406e-2ca7-4663-a203-68a1928fdc53' and price > 0

The other one with MAX:

SELECT min(price) from price_audit
         where (timestamp >= (select MAX(timestamp) from price_audit
         where timestamp < '2022-07-12T15:30:00-00:00' and productid = 'ABCTEST1' and partnerid = 'b319406e-2ca7-4663-a203-68a1928fdc53') or
         timestamp >= '2022-07-12T15:30:00-00:00' and timestamp < '2022-08-11T15:30:00-00:00') and
         productid = 'ABCTEST1' and partnerid = 'b319406e-2ca7-4663-a203-68a1928fdc53' and price > 0

Query Cost : 76

Actually this queries working well but I am not sure that all cases is ok. Because if there any data in 2022/07/12:00:00:00 query working wrong. But it’s okay, I can ignore that.

Also, approximate run time is 50-100ms. If that query is ok I want to reduce it below 50Ms because I may have to run this query 1M times in 1 day.

How can we improve the performance of this query?

2

Answers


  1. take the first price older than 30 days, along with prices from the last 30 days, and return the minimum

    I expect this to perform better:

    SELECT LEAST (
      (SELECT min(price)
       FROM   price_audit
       WHERE  partnerid = 'b319406e-2ca7-4663-a203-68a1928fdc53'
       AND    productid = 'ABCTEST1'
       AND    price > 0
       AND    timestamp >= '2022-07-12 15:30:00-00:00'
      )
    , (SELECT price
       FROM   price_audit
       WHERE  partnerid = 'b319406e-2ca7-4663-a203-68a1928fdc53'
       AND    productid = 'ABCTEST1'
       AND    price > 0
       AND    timestamp < '2022-07-12 15:30:00-00:00'
       ORDER  BY timestamp DESC
       LIMIT  1
       )
       );
    

    Your index partnerid_productid_time_desc should be good for this, but this index should be much better:

    CREATE INDEX price_audit_partnerid_productid_time_desc ON price_audit (partnerid, productid, timestamp DESC) INCLUDE (price)  -- ① 
    WHERE  price > 0;  -- ②
    

    ① The covering index only makes sense if you get index-only scans out of it. See:

    ② Only makes sense if excluding a sizable percentage of rows.

    WHERE partnerid IS NOT NULL AND productid IS NOT NULL (like you have now) never makes sense since both columns are defined NOT NULL.

    Aside:
    I would not use the name "timestamp" for a timestamptz column. Best not to use basic type names as identifiers at all. Too confusing.

    Login or Signup to reply.
  2. I don’t know, if my query has a chance of being faster than yours. Just try.

    Here is what I do: As there should be an index on (productid, partnerid), it should be very quick to find all prices of interest. The new price included, we look at all previous prices. Once we’ve done that we only look at those valid after the start of the 30 day period. So, the first previous price is the one that was valid in the very moment of the period’s start. And the last price is the one that was valid, before we entered the new one.

    select min(prev_price)
    from
    (
      select
        timestamp, 
        lag(price) over (order by timestamp) as prev_price,
        max(timestamp) over () as max_timestamp
      from price_audit
      where productid = 'ABCTEST1'
      and partnerid = 'b319406e-2ca7-4663-a203-68a1928fdc53'
      and price > 0
    ) t
    where timestamp > max_timestamp - interval '30 days';
    

    It would help with the sorting, if the index included the timestamp, too. And as the only column missing then would be the price, we can even add that, so we get a covering index. So we end up with the same index that Erwin Brandstetter proposes in his answer, except for the ordering of the timestamp, where I would prefer ascending order for my query (but I guess it doesn’t really make a difference).

    create index idx on price_audit(productid, partnerid, timestamp) include price where price > 0;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search