I have below price audit table in timescale db. When I get new price I should calculate minimum price in last 30 days. But our price data not daily. In every price changes I write date and price to table. Problem is when I try to find the minimum price, I have to go before the last 30 days because there is not enough data. Also there are 130 Million rows in the table.
In this case minimum price should be 120. Because this product sold for 120 ₺ between 12 July to 4 August.
DDL Example:
create table price_audit (
timestamp timestamp with time zone not null,
partnerid varchar(200) not null,
productid varchar(200) not null,
price numeric(12, 2)
);
alter table price_audit
owner to appuser;
create index _price_audit_timestamp_idx
on price_audit (timestamp desc);
create index productid_time_desc
on price_audit (productid asc, timestamp desc)
where (productid IS NOT NULL);
create index partnerid_time_desc
on price_audit (partnerid asc, timestamp desc)
where (partnerid IS NOT NULL);
create index partnerid_productid_time_desc
on price_audit (partnerid asc, productid asc, timestamp desc)
where ((partnerid IS NOT NULL) AND (productid IS NOT NULL));
Data Example:
Insert Query :
insert into price_audit ("timestamp", partnerid, productid, price)
values ('2022-07-09 18:18:39.000000 +00:00','b319406e-2ca7-4663-a203-68a1928fdc53','ABCTEST1', 120.00),
('2022-08-05 21:19:40.000000 +00:00','b319406e-2ca7-4663-a203-68a1928fdc53','ABCTEST1',130.00),
('2022-08-10 11:20:39.000000 +00:00','b319406e-2ca7-4663-a203-68a1928fdc53','ABCTEST1',140.00);
When new price came, we should not include in range:
New Price Date: 2022/08/11
30 days subtracted date: 2022/07/12
The following queries take the first price older than 30 days, along with prices from the last 30 days, and return the minimum.
SELECT min(price) from price_audit
where (timestamp >= (select timestamp from price_audit
where timestamp < '2022-07-12T15:30:00-00:00' and productid = 'ABCTEST1' and partnerid = 'b319406e-2ca7-4663-a203-68a1928fdc53'
order by timestamp desc limit 1) or
timestamp >= '2022-07-12T15:30:00-00:00' and timestamp < '2022-08-11T15:30:00-00:00') and
productid = 'ABCTEST1' and partnerid = 'b319406e-2ca7-4663-a203-68a1928fdc53' and price > 0
The other one with MAX:
SELECT min(price) from price_audit
where (timestamp >= (select MAX(timestamp) from price_audit
where timestamp < '2022-07-12T15:30:00-00:00' and productid = 'ABCTEST1' and partnerid = 'b319406e-2ca7-4663-a203-68a1928fdc53') or
timestamp >= '2022-07-12T15:30:00-00:00' and timestamp < '2022-08-11T15:30:00-00:00') and
productid = 'ABCTEST1' and partnerid = 'b319406e-2ca7-4663-a203-68a1928fdc53' and price > 0
Query Cost : 76
Actually this queries working well but I am not sure that all cases is ok. Because if there any data in 2022/07/12:00:00:00 query working wrong. But it’s okay, I can ignore that.
Also, approximate run time is 50-100ms. If that query is ok I want to reduce it below 50Ms because I may have to run this query 1M times in 1 day.
How can we improve the performance of this query?
2
Answers
I expect this to perform better:
Your index
partnerid_productid_time_desc
should be good for this, but this index should be much better:① The covering index only makes sense if you get index-only scans out of it. See:
② Only makes sense if excluding a sizable percentage of rows.
WHERE partnerid IS NOT NULL AND productid IS NOT NULL
(like you have now) never makes sense since both columns are definedNOT NULL
.Aside:
I would not use the name "timestamp" for a
timestamptz
column. Best not to use basic type names as identifiers at all. Too confusing.I don’t know, if my query has a chance of being faster than yours. Just try.
Here is what I do: As there should be an index on (productid, partnerid), it should be very quick to find all prices of interest. The new price included, we look at all previous prices. Once we’ve done that we only look at those valid after the start of the 30 day period. So, the first previous price is the one that was valid in the very moment of the period’s start. And the last price is the one that was valid, before we entered the new one.
It would help with the sorting, if the index included the timestamp, too. And as the only column missing then would be the price, we can even add that, so we get a covering index. So we end up with the same index that Erwin Brandstetter proposes in his answer, except for the ordering of the timestamp, where I would prefer ascending order for my query (but I guess it doesn’t really make a difference).