skip to Main Content

I would like to apply window function between times that beyond midnight.
The following is a part of example data of per-minute values one stock.
Actually it does exists more longer that the example.

Data type of datetime is timestamptz and others are int.

I want to query diff of first price and last price between 22:30-04:21 for each day.

How I can do it?

I want to do seems like following but it does not work.

select distinct datetime::date, first_value(open) over w as first, last_value(close) over w as last 
from price_table window w as 
(partition by datetime::time range between 22:30 and 04:21); 

price_table:

datetime    open    high    low     close   volume // type of datetime is timestamptz others are int.
...
 
// there is a more older value (several years actually) 

2023/01/03  23:50   25830   25840   25830   25830   1139
2023/01/03  23:51   25835   25835   25820   25825   786
2023/01/03  23:52   25825   25845   25825   25835   1196
2023/01/03  23:53   25840   25840   25825   25825   874
2023/01/03  23:54   25825   25835   25820   25825   680
2023/01/03  23:55   25820   25825   25810   25815   1237
2023/01/03  23:56   25810   25815   25805   25810   1163
2023/01/03  23:57   25810   25835   25810   25815   1753
2023/01/03  23:58   25810   25840   25810   25830   823
2023/01/03  23:59   25835   25845   25830   25845   1008
2023/01/03  0:00    25845   25855   25840   25850   1235
2023/01/03  0:01    25850   25850   25845   25845   439
2023/01/03  0:02    25845   25850   25835   25840   1146
2023/01/03  0:03    25840   25855   25840   25845   668
...

Expected Result table: (diff is a result of subtraction of last value in the period (close) and first value in the period(close).)

date          open     close    diff
2023/01/03    25810    25840    30
2023/01/04    25830    25820    20

2

Answers


  1. One thing you can do to solve the different days is to subtract (or add) a certain amount of time to each date so that the values around midnight will be considered for the same date. For example like this:

    select distinct (datetime - INTERVAL '5 hour')::date,
    first_value(open) over (partition by (datetime - INTERVAL '5 hour')::date order by datetime) as open,
    last_value(close) over (partition by (datetime - INTERVAL '5 hour')::date order by datetime RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as close,
    last_value(close) over (partition by (datetime - INTERVAL '5 hour')::date order by datetime RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) - first_value(open) over (partition by (datetime - INTERVAL '5 hour')::date order by datetime) as diff
    from price_table
    where datetime::time between '22:30' and '23:59' or datetime::time between '00:00' and '04:21'; 
    

    Here the values around midnight are considered for the day before midnight, e.g. "2023/01/04 0:03" is considered for 2023/01/03. If you would like to consider them for the day after midnight instead, you have to change - INTERVAL '5 hour' to + INTERVAL '2 hour'.

    Login or Signup to reply.
  2. The following query produces the results described in the original question:

    WITH daily_prices AS (
      SELECT
          DATE_TRUNC('day', datetime + INTERVAL 'PT1H30M')::date AS date,
          (ARRAY_AGG(open ORDER BY datetime))[1] AS open,
          (ARRAY_AGG(close ORDER BY datetime DESC))[1] AS close
        FROM
          price_table
        WHERE
             datetime::time >= '21:30'
          OR datetime::time <= '04:21'
        GROUP BY
          date)
    SELECT date, open, close, close - open as diff
      FROM daily_prices
      ORDER BY date;
    

    The query uses a common table expression (CTE) to reduce redundant calculations. Although not a major issue in this case, repeated code can be a maintenance issue when the calculations are more complex or produce values that are needed in multiple places. Even for a query as simple as this one, repeating calculations would make the code more difficult to understand and increase opportunities to introduce defects.

    The query starts by determining the effective date of each price record by adding an offset and then keeping the date portion of the resulting value. Rows are filtered by the unadjusted time and grouped by price_date regardless of whether a record’s time occurs before or after midnight.

    Since FIRST_VALUE and LAST_VALUE are window functions but not aggregate functions, they can’t reference non-grouped values; however, the desired results can be obtained by collecting values into sorted arrays and taking the first element. To get the last value, the array is sorted in descending order so that it isn’t necessary to know how many elements are in the array.

    The intuition to employ FIRST_VALUE and LAST_VALUE was reasonable, but leads to a sligtly verbose solution. Contrast the above query with the following:

    WITH parms AS (
      SELECT INTERVAL 'PT1H30M' as time_shift),
    daily_prices AS (
      SELECT DISTINCT
        (datetime + parms.time_shift)::date AS date,
        FIRST_VALUE(open) OVER (PARTITION by (datetime + parms.time_shift)::date
                                ORDER BY datetime) AS open,
        LAST_VALUE(close) OVER (PARTITION by (datetime + parms.time_shift)::date
                                 ORDER BY datetime
                                 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS close
        FROM
          parms CROSS JOIN price_table
        WHERE
             datetime::time >= '22:30'
          OR datetime::time <= '04:21')
    SELECT date, open, close, close - open as diff
      FROM daily_prices
      ORDER BY date;
    

    The function names FIRST_VALUE and LAST_VALUE reflect the operation to be performed and are therefore more intuitive than the use of ARRAY_AGG in the first query; however, that advantage is offset by the amount of additional code required. Some of the extra code can be eliminated; e.g., the window frame definition can be omitted if FIRST_VALUE is used instead of LAST_VALUE and the order is changed to datetime DESC. While doing so reduces the amount of code, it also dispenses with the intuitive advantage of LAST_VALUE over ARRAY_AGG. A bigger issue is the need to repeat the partitioning expression. It’s a minor annoyance in such a brief query, but becomes a more significant concern as the number of repetitions increases.

    Avoid using SQL keywords as column names. CLOSE, DATE, and OPEN are keywords in PostgreSQL and SQL and are also reserved in SQL. I would also avoid using datetime as a column name, it’s not descriptive and is used as a type or function name in some SQL implementations; e.g., SQLite, MySQL, and SQL Server.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search