I would like to apply window function between times that beyond midnight.
The following is a part of example data of per-minute values one stock.
Actually it does exists more longer that the example.
Data type of datetime is timestamptz and others are int.
I want to query diff of first price and last price between 22:30-04:21 for each day.
How I can do it?
I want to do seems like following but it does not work.
select distinct datetime::date, first_value(open) over w as first, last_value(close) over w as last
from price_table window w as
(partition by datetime::time range between 22:30 and 04:21);
price_table:
datetime open high low close volume // type of datetime is timestamptz others are int.
...
// there is a more older value (several years actually)
2023/01/03 23:50 25830 25840 25830 25830 1139
2023/01/03 23:51 25835 25835 25820 25825 786
2023/01/03 23:52 25825 25845 25825 25835 1196
2023/01/03 23:53 25840 25840 25825 25825 874
2023/01/03 23:54 25825 25835 25820 25825 680
2023/01/03 23:55 25820 25825 25810 25815 1237
2023/01/03 23:56 25810 25815 25805 25810 1163
2023/01/03 23:57 25810 25835 25810 25815 1753
2023/01/03 23:58 25810 25840 25810 25830 823
2023/01/03 23:59 25835 25845 25830 25845 1008
2023/01/03 0:00 25845 25855 25840 25850 1235
2023/01/03 0:01 25850 25850 25845 25845 439
2023/01/03 0:02 25845 25850 25835 25840 1146
2023/01/03 0:03 25840 25855 25840 25845 668
...
Expected Result table: (diff
is a result of subtraction of last value in the period (close) and first value in the period(close).)
date open close diff
2023/01/03 25810 25840 30
2023/01/04 25830 25820 20
2
Answers
One thing you can do to solve the different days is to subtract (or add) a certain amount of time to each date so that the values around midnight will be considered for the same date. For example like this:
Here the values around midnight are considered for the day before midnight, e.g. "2023/01/04 0:03" is considered for 2023/01/03. If you would like to consider them for the day after midnight instead, you have to change
- INTERVAL '5 hour'
to+ INTERVAL '2 hour'
.The following query produces the results described in the original question:
The query uses a common table expression (CTE) to reduce redundant calculations. Although not a major issue in this case, repeated code can be a maintenance issue when the calculations are more complex or produce values that are needed in multiple places. Even for a query as simple as this one, repeating calculations would make the code more difficult to understand and increase opportunities to introduce defects.
The query starts by determining the effective date of each price record by adding an offset and then keeping the date portion of the resulting value. Rows are filtered by the unadjusted time and grouped by price_date regardless of whether a record’s time occurs before or after midnight.
Since
FIRST_VALUE
andLAST_VALUE
are window functions but not aggregate functions, they can’t reference non-grouped values; however, the desired results can be obtained by collecting values into sorted arrays and taking the first element. To get the last value, the array is sorted in descending order so that it isn’t necessary to know how many elements are in the array.The intuition to employ
FIRST_VALUE
andLAST_VALUE
was reasonable, but leads to a sligtly verbose solution. Contrast the above query with the following:The function names
FIRST_VALUE
andLAST_VALUE
reflect the operation to be performed and are therefore more intuitive than the use ofARRAY_AGG
in the first query; however, that advantage is offset by the amount of additional code required. Some of the extra code can be eliminated; e.g., the window frame definition can be omitted ifFIRST_VALUE
is used instead ofLAST_VALUE
and the order is changed todatetime DESC
. While doing so reduces the amount of code, it also dispenses with the intuitive advantage ofLAST_VALUE
overARRAY_AGG
. A bigger issue is the need to repeat the partitioning expression. It’s a minor annoyance in such a brief query, but becomes a more significant concern as the number of repetitions increases.Avoid using SQL keywords as column names.
CLOSE
,DATE
, andOPEN
are keywords in PostgreSQL and SQL and are also reserved in SQL. I would also avoid using datetime as a column name, it’s not descriptive and is used as a type or function name in some SQL implementations; e.g., SQLite, MySQL, and SQL Server.