I have a table named assets. Here is the ddl:
create table assets (
id bigint primary key,
name varchar(255) not null,
value double precision not null,
business_time timestamp with time zone,
insert_time timestamp with time zone default now() not null
);
create index idx_assets_name on assets (name);
I need to extract the newest (based on insert_time) value for each asset name. This is the query that I initially used:
SELECT DISTINCT
ON (a.name) *
FROM home.assets a
WHERE a.name IN (
'USD_RLS',
'EUR_RLS',
'SEKKEH_RLS',
'NIM_SEKKEH_RLS',
'ROB_SEKKEH_RLS',
'BAHAR_RLS',
'GOLD_18_RLS',
'GOLD_OUNCE_USD',
'SILVER_OUNCE_USD',
'PLATINUM_OUNCE_USD',
'GOLD_MESGHAL_RLS',
'GOLD_24_RLS',
'STOCK_IR',
'AED_RLS',
'GBP_RLS',
'CAD_RLS',
'CHF_RLS',
'TRY_RLS',
'AUD_RLS',
'JPY_RLS',
'CNY_RLS',
'RUB_RLS',
'BTC_USD'
)
ORDER BY a.name,
a.insert_time DESC;
I have around 300,000 rows in the assets table. On my VPS this query takes about 800 ms. this is causing a whole response time of about 1 second for a specific endpoint. This is a bit slow and considering the fact that the assets table is growing fast, this endpoint will be even slower in the near future. I also tried to avoid IN(...)
using this query:
SELECT DISTINCT
ON (a.name) *
FROM home.assets a
ORDER BY a.name,
a.insert_time DESC;
But I didn’t notice a significant difference. Any idea how I could optimize this query?
2
Answers
You may try adding the following index to your table:
If used, Postgres can simply scan this index to find the distinct record having the most recent
insert_time
for each name.For more than a few rows per
name
in the table (looks to be so), I expect this query to be substantially faster, yet:Pass your list as array, unnest, and then get each latest row in a
LATERAL
subquery. TheCROSS JOIN
eliminates names that are not found at all. (You might be interested inLEFT JOIN LATERAL ... ON true
instead, to keep those in the result.)You still need the multicolumn index that Tim mentioned.
Default ascending sort order would work, too, in this case. Postgres can scan backwards:
See:
Not the number of rows in the table, but the number of rows per group (per
name
in your case) decides whetherDISTINCT ON
is the best choice. See this benchmark comparing relevant query styles: