skip to Main Content

I want to write a Postgres function that looks up the value of a column in a given table.

Example setup:

CREATE SCHEMA IF NOT EXISTS example;

CREATE TABLE example.price_table (
    symbol text NOT NULL,
    time_of timestamptz NOT NULL,
    price float8 null
    volume float8 null );

CREATE TABLE example.interesting_times (
    symbol text NOT NULL,
    time_of timestamptz NOT NULL );

Sample data

INSERT INTO example.price_table VALUES
    ('GME', '2016-01-01 00:00:00', 10.0, 10),
    ('GME', '2016-01-02 00:00:00', 11.0, 3),
    ('GME', '2016-01-03 00:00:00', 9.0 , 7),
    ('GME', '2016-01-03 00:40:00', 10.5, 76),
    ('GME', '2016-01-04 00:00:00', 11.0, 10),
    ('BBBY', '2016-01-01 00:00:00', 20.0, 5),
    ('BBBY', '2016-01-02 02:30:00', 2123.0, 1),
    ('BBBY', '2016-01-02 03:30:00', 2133.0, 1),
    ('BBBY', '2016-01-02 04:30:00', 2144.0, 100),
    ('BBBY', '2016-01-03 00:00:00', 29.0, 999),
    ('BBBY', '2016-01-03 03:40:00', 20.5, 1),
    ('BBBY', '2016-01-03 04:40:00', 20.6, 54),
    ('BBBY', '2016-01-04 00:00:00', 21.0, 34),
    ('BBBY', '2016-01-06 00:00:00', 666.0, 1);   
 
INSERT INTO example.interesting_times VALUES
    ('GME', '2016-01-01 00:00:00'),
    ('GME', '2016-01-02 00:30:00'),
    ('GME', '2016-01-03 05:00:00'),
    ('GME', '2016-01-03 00:40:00'),
    ('GME', '2016-01-04 00:00:00'),
    ('BBBY', '2016-01-01 01:00:00'),
    ('BBBY', '2016-01-02 00:00:00'),
    ('BBBY', '2016-01-03 00:00:00'),
    ('BBBY', '2016-01-03 07:40:00'),
    ('BBBY', '2016-01-04 00:00:00');

While querying the interesting_times table I want to add the earliest market price from the markout_price_table table, at least 3 hours, but not more than 3 hours + 24 hours into the future. Ideally, I would just call a markout_price_table function like:

select symbol
     , time_of
     , markout_price_table(  symbol
                           , time_of
                           , 'price'
                           , '3 hours'
                           , '24 hours') as price_3_hours_later
from example.interesting_times 
order by symbol, time_of;

I could get different markout times by adding:

markout_price_table(symbol, time_of, 'price', '6 hours', '24 hours') as price_6_hours_later`

or

markout_price_table(symbol, time_of, 'volume', '0 hours', '24 hours') as volume_at_time

Is it possible to encapsulate the logic in a function like this? If so, would it be as fast as getting all of the data downloaded and doing the merges locally?

I can get the desired result of the above using a few merges but I don’t like this as a solution as I would like to encapsulate good logic for the merges rather than copy pasting the below lots of times into each query.

with tabA as (
    select a.symbol as symbol
         , a.time_of as time_of
         , b.price as price_3_hours_later
         , b.time_of - (a.time_of + '3 hours')  as timeliness
    from  example.interesting_times a 
    left join example.price_table b 
      on (a.symbol = b.symbol) 
     and (a.time_of + '3 hours' < b.time_of) 
     and (a.time_of + '3 hours' + '24 hours' > b.time_of) 
    order by b.time_of - (a.time_of + '3 hours')
)
, tabB as (
    select  symbol
          , time_of
          , price_3_hours_later
          , timeliness
          , min(timeliness) over w1 as min_tim
    from tabA
    window w1 as (partition by symbol, time_of order by timeliness) 
) 
select symbol
     , time_of
     , price_3_hours_later 
from tabB 
where (timeliness = min_tim) 
   or (timeliness is null) 
order by symbol, time_of;

So the result should look like:

Symbol  time_of                         price_3_hours_later
BBBY    2016-01-01 01:00:00.000 +0000   2123.0
BBBY    2016-01-02 00:00:00.000 +0000   2133.0
BBBY    2016-01-03 00:00:00.000 +0000   20.5
BBBY    2016-01-03 07:40:00.000 +0000   21.0
BBBY    2016-01-04 00:00:00.000 +0000   NULL
GME     2016-01-01 00:00:00.000 +0000   11.0
GME     2016-01-02 00:30:00.000 +0000   9.0
GME     2016-01-03 00:40:00.000 +0000   11.0
GME     2016-01-03 05:00:00.000 +0000   11.0
GME     2016-01-04 00:00:00.000 +0000   NULL

2

Answers


  1. You can shorten that query and wrap it in a function: demo at db<>fiddle

    create function markout_price_table(
      p_symbol text, p_time_of timestamptz, min_after interval, max_after interval)
    returns float8 language sql return (
      select price from example.price_table 
       where symbol=p_symbol
         and time_of between p_time_of+min_after
                         and p_time_of+max_after
       order by time_of limit 1);
    

    And that would work exactly how you wanted:

    select symbol
         , time_of
         , markout_price_table(symbol, time_of, '3 hours', '27 hours') as price_3_hours_later
    from example.interesting_times 
    order by symbol, time_of;
    
    symbol time_of price_3_hours_later
    BBBY 2016-01-01 01:00:00+00 2123
    BBBY 2016-01-02 00:00:00+00 2133
    BBBY 2016-01-03 00:00:00+00 20.5
    BBBY 2016-01-03 07:40:00+00 21
    BBBY 2016-01-04 00:00:00+00 null
    GME 2016-01-01 00:00:00+00 11
    GME 2016-01-02 00:30:00+00 9
    GME 2016-01-03 00:40:00+00 11
    GME 2016-01-03 05:00:00+00 11
    GME 2016-01-04 00:00:00+00 null

    Since you put price right in the name of the function, I don’t think it makes sense to pass 'price' as a parameter.

    If you wanted the function to dynamically select the price column, it would complicate things a bit. You would have to make it a polymorphic function and pass a sample value into into it, so that it can infer its return type at call time.

    Login or Signup to reply.
  2. Can be solved with a single, plain SQL statement using a LATERAL subquery:

    SELECT i.symbol, i.time_of, p.price_3_hours_later
    FROM   interesting_times i
    LEFT   JOIN LATERAL (
       SELECT p.price AS price_3_hours_later
       FROM   price_table p
       WHERE  p.symbol = i.symbol
       AND    p.time_of >= i.time_of + interval '3h'
       AND    p.time_of <= i.time_of + interval '27h'
       ORDER  BY p.time_of
       LIMIT  1
       )p ON true
    ORDER  BY i.symbol, i.time_of;
    

    This needs an index on price_table (symbol, time_of) to be fast. Ideally:

    CREATE INDEX ON price_table (symbol, time_of) INCLUDE (price);
    

    You can wrap this into a SQL function if you insist:

    CREATE OR REPLACE FUNCTION markout_price_table(_start interval, _duration interval)
      RETURNS TABLE (symbol text
                   , time_of timestamptz
                   , price float8
                    )
      LANGUAGE sql STABLE STRICT PARALLEL SAFE AS
    $func$
    SELECT i.symbol, i.time_of, p.price_3_hours_later
    FROM   interesting_times i
    LEFT   JOIN LATERAL (
       SELECT p.price AS price_3_hours_later
       FROM   price_table p
       WHERE  p.symbol = i.symbol
       AND    p.time_of >= i.time_of + _start
       AND    p.time_of <= i.time_of + _start + _duration
       ORDER  BY p.time_of
       LIMIT  1
       )p ON true
    ORDER  BY i.symbol, i.time_of;
    $func$;
    

    Call:

    SELECT * FROM markout_price_table('3h', '24h');
    

    Zegarek already provided a valid drop-in version of the function. But be sure to add appropriate function labels to not hinder performance:

    CREATE OR REPLACE FUNCTION markout_price(_symbol text, _time_of timestamptz, _start interval, _duration interval)
      RETURNS float8
      LANGUAGE sql STABLE STRICT PARALLEL SAFE AS  -- !
    $func$
    SELECT p.price
    FROM   price_table p
    WHERE  p.symbol = _symbol
    AND    p.time_of >= _time_of + _start
    AND    p.time_of <= _time_of + _start + _duration
    ORDER  BY p.time_of
    LIMIT  1;
    $func$;
    

    Call:

    SELECT i.symbol, i.time_of
         , markout_price(i.symbol, i.time_of, '3 h', '24 h') AS price_3_hours_later
    FROM   interesting_times i;
    

    fiddle

    so would it be as fast as getting all of the data downloaded and doing the merges locally?

    Actually, either solution (including Zegarek’s) will be substantially faster, as long as you have that index.

    Related:

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search