Postgresql - Function to encapsulate lookup logic

Stuart
June 6, 2024
159 views
0 votes
2 Answers

I want to write a Postgres function that looks up the value of a column in a given table.

Example setup:

CREATE SCHEMA IF NOT EXISTS example;

CREATE TABLE example.price_table (
    symbol text NOT NULL,
    time_of timestamptz NOT NULL,
    price float8 null
    volume float8 null );

CREATE TABLE example.interesting_times (
    symbol text NOT NULL,
    time_of timestamptz NOT NULL );

Sample data

INSERT INTO example.price_table VALUES
    ('GME', '2016-01-01 00:00:00', 10.0, 10),
    ('GME', '2016-01-02 00:00:00', 11.0, 3),
    ('GME', '2016-01-03 00:00:00', 9.0 , 7),
    ('GME', '2016-01-03 00:40:00', 10.5, 76),
    ('GME', '2016-01-04 00:00:00', 11.0, 10),
    ('BBBY', '2016-01-01 00:00:00', 20.0, 5),
    ('BBBY', '2016-01-02 02:30:00', 2123.0, 1),
    ('BBBY', '2016-01-02 03:30:00', 2133.0, 1),
    ('BBBY', '2016-01-02 04:30:00', 2144.0, 100),
    ('BBBY', '2016-01-03 00:00:00', 29.0, 999),
    ('BBBY', '2016-01-03 03:40:00', 20.5, 1),
    ('BBBY', '2016-01-03 04:40:00', 20.6, 54),
    ('BBBY', '2016-01-04 00:00:00', 21.0, 34),
    ('BBBY', '2016-01-06 00:00:00', 666.0, 1);   
 
INSERT INTO example.interesting_times VALUES
    ('GME', '2016-01-01 00:00:00'),
    ('GME', '2016-01-02 00:30:00'),
    ('GME', '2016-01-03 05:00:00'),
    ('GME', '2016-01-03 00:40:00'),
    ('GME', '2016-01-04 00:00:00'),
    ('BBBY', '2016-01-01 01:00:00'),
    ('BBBY', '2016-01-02 00:00:00'),
    ('BBBY', '2016-01-03 00:00:00'),
    ('BBBY', '2016-01-03 07:40:00'),
    ('BBBY', '2016-01-04 00:00:00');

While querying the interesting_times table I want to add the earliest market price from the markout_price_table table, at least 3 hours, but not more than 3 hours + 24 hours into the future. Ideally, I would just call a markout_price_table function like:

select symbol
     , time_of
     , markout_price_table(  symbol
                           , time_of
                           , 'price'
                           , '3 hours'
                           , '24 hours') as price_3_hours_later
from example.interesting_times 
order by symbol, time_of;

I could get different markout times by adding:

markout_price_table(symbol, time_of, 'price', '6 hours', '24 hours') as price_6_hours_later`

markout_price_table(symbol, time_of, 'volume', '0 hours', '24 hours') as volume_at_time

Is it possible to encapsulate the logic in a function like this? If so, would it be as fast as getting all of the data downloaded and doing the merges locally?

I can get the desired result of the above using a few merges but I don’t like this as a solution as I would like to encapsulate good logic for the merges rather than copy pasting the below lots of times into each query.

with tabA as (
    select a.symbol as symbol
         , a.time_of as time_of
         , b.price as price_3_hours_later
         , b.time_of - (a.time_of + '3 hours')  as timeliness
    from  example.interesting_times a 
    left join example.price_table b 
      on (a.symbol = b.symbol) 
     and (a.time_of + '3 hours' < b.time_of) 
     and (a.time_of + '3 hours' + '24 hours' > b.time_of) 
    order by b.time_of - (a.time_of + '3 hours')
)
, tabB as (
    select  symbol
          , time_of
          , price_3_hours_later
          , timeliness
          , min(timeliness) over w1 as min_tim
    from tabA
    window w1 as (partition by symbol, time_of order by timeliness) 
) 
select symbol
     , time_of
     , price_3_hours_later 
from tabB 
where (timeliness = min_tim) 
   or (timeliness is null) 
order by symbol, time_of;

So the result should look like:

Symbol  time_of                         price_3_hours_later
BBBY    2016-01-01 01:00:00.000 +0000   2123.0
BBBY    2016-01-02 00:00:00.000 +0000   2133.0
BBBY    2016-01-03 00:00:00.000 +0000   20.5
BBBY    2016-01-03 07:40:00.000 +0000   21.0
BBBY    2016-01-04 00:00:00.000 +0000   NULL
GME     2016-01-01 00:00:00.000 +0000   11.0
GME     2016-01-02 00:30:00.000 +0000   9.0
GME     2016-01-03 00:40:00.000 +0000   11.0
GME     2016-01-03 05:00:00.000 +0000   11.0
GME     2016-01-04 00:00:00.000 +0000   NULL

Answers

You can shorten that query and wrap it in a function: demo at db<>fiddle

create function markout_price_table(
  p_symbol text, p_time_of timestamptz, min_after interval, max_after interval)
returns float8 language sql return (
  select price from example.price_table 
   where symbol=p_symbol
     and time_of between p_time_of+min_after
                     and p_time_of+max_after
   order by time_of limit 1);

And that would work exactly how you wanted:

select symbol
     , time_of
     , markout_price_table(symbol, time_of, '3 hours', '27 hours') as price_3_hours_later
from example.interesting_times 
order by symbol, time_of;

symbol	time_of	price_3_hours_later
BBBY	2016-01-01 01:00:00+00	2123
BBBY	2016-01-02 00:00:00+00	2133
BBBY	2016-01-03 00:00:00+00	20.5
BBBY	2016-01-03 07:40:00+00	21
BBBY	2016-01-04 00:00:00+00	null
GME	2016-01-01 00:00:00+00	11
GME	2016-01-02 00:30:00+00	9
GME	2016-01-03 00:40:00+00	11
GME	2016-01-03 05:00:00+00	11
GME	2016-01-04 00:00:00+00	null

Since you put price right in the name of the function, I don’t think it makes sense to pass 'price' as a parameter.

If you wanted the function to dynamically select the price column, it would complicate things a bit. You would have to make it a polymorphic function and pass a sample value into into it, so that it can infer its return type at call time.

Can be solved with a single, plain SQL statement using a LATERAL subquery:

SELECT i.symbol, i.time_of, p.price_3_hours_later
FROM   interesting_times i
LEFT   JOIN LATERAL (
   SELECT p.price AS price_3_hours_later
   FROM   price_table p
   WHERE  p.symbol = i.symbol
   AND    p.time_of >= i.time_of + interval '3h'
   AND    p.time_of <= i.time_of + interval '27h'
   ORDER  BY p.time_of
   LIMIT  1
   )p ON true
ORDER  BY i.symbol, i.time_of;

This needs an index on price_table (symbol, time_of) to be fast. Ideally:

CREATE INDEX ON price_table (symbol, time_of) INCLUDE (price);

You can wrap this into a SQL function if you insist:

CREATE OR REPLACE FUNCTION markout_price_table(_start interval, _duration interval)
  RETURNS TABLE (symbol text
               , time_of timestamptz
               , price float8
                )
  LANGUAGE sql STABLE STRICT PARALLEL SAFE AS
$func$
SELECT i.symbol, i.time_of, p.price_3_hours_later
FROM   interesting_times i
LEFT   JOIN LATERAL (
   SELECT p.price AS price_3_hours_later
   FROM   price_table p
   WHERE  p.symbol = i.symbol
   AND    p.time_of >= i.time_of + _start
   AND    p.time_of <= i.time_of + _start + _duration
   ORDER  BY p.time_of
   LIMIT  1
   )p ON true
ORDER  BY i.symbol, i.time_of;
$func$;

Call:

SELECT * FROM markout_price_table('3h', '24h');

Zegarek already provided a valid drop-in version of the function. But be sure to add appropriate function labels to not hinder performance:

CREATE OR REPLACE FUNCTION markout_price(_symbol text, _time_of timestamptz, _start interval, _duration interval)
  RETURNS float8
  LANGUAGE sql STABLE STRICT PARALLEL SAFE AS  -- !
$func$
SELECT p.price
FROM   price_table p
WHERE  p.symbol = _symbol
AND    p.time_of >= _time_of + _start
AND    p.time_of <= _time_of + _start + _duration
ORDER  BY p.time_of
LIMIT  1;
$func$;

Call:

SELECT i.symbol, i.time_of
     , markout_price(i.symbol, i.time_of, '3 h', '24 h') AS price_3_hours_later
FROM   interesting_times i;

fiddle

so would it be as fast as getting all of the data downloaded and doing the merges locally?

Actually, either solution (including Zegarek’s) will be substantially faster, as long as you have that index.

Optimize GROUP BY query to retrieve latest row per user

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Function to encapsulate lookup logic

Answers