I have a table foo:
some_fk | some_field | some_date_field |
---|---|---|
1 | A | 1990-01-01 |
1 | B | 1990-01-02 |
1 | C | 1990-03-01 |
1 | X | 1990-04-01 |
2 | B | 1990-01-01 |
2 | B | 1990-01-05 |
2 | Z | 1991-04-11 |
2 | C | 1992-01-01 |
2 | B | 1992-02-01 |
2 | Y | 1992-03-01 |
3 | C | 1990-01-01 |
some_field
has 6 possible values: [A,B,C,X,Y,Z]
Where [A,B,C]
signify opening or continuation events and [X,Y,Z]
signify closing events. How do I get each span of time between the first opening event and closing event of each span, partitioned by some_fk
, as shown in the table below:
some_fk | some_date_field_start | some_date_field_end |
---|---|---|
1 | 1990-01-01 | 1990-04-01 |
2 | 1990-01-01 | 1991-04-11 |
2 | 1992-01-01 | 1992-03-01 |
3 | 1990-01-01 | NULL |
*Note that a non-terminated time span ends with NULL
I do have a solution that involves 3 common table expressions, but I’m wondering if there is a (better/more elegant/canonical) way to do this in PostgreSQL without nested queries.
My approach was something like:
WITH ranked AS (
SELECT
RANK() OVER (PARTITION BY some_fk ORDER BY some_date_field) AS "rank"
some_fk,
some_field,
some_date_field
FROM foo
), openers AS (
SELECT * FROM ranked WHERE some_field IN ('A','B','C')
), closers AS (
SELECT
*,
LAG("rank") OVER (PARTITION BY some_fk ORDER BY "rank") AS rank_lag
FROM ranked WHERE some_field IN ('X','Y','Z')
)
SELECT DISTINCT
openers.some_fk,
FIRST_VALUE(openers.some_date_field) OVER (PARTITION BY some_fk ORDER BY "rank")
AS some_date_field_start,
closers.some_date_field AS some_date_field_end
FROM openers
JOIN closers
ON openers.some_fk = closers.some_fk
WHERE openers.some_date_field BETWEEN COALESCE(closers.rank_lag, 0) AND closers.rank
… but I feel there must be a better way.
Thanks in advance for the help.
2
Answers
The basis of the query is to use
LAG
to determine if the previous record was a closure.This allows you to filter on the correct 4 records from your expected results, with the first 2 columns included; your mistake was to put the
WHERE
clause onto that query directly, when what you want to do is use it as is in a sub-query and write theWHERE
in the main query.From that point, you have several possibilities to finish the query.
Here is a version using a scalar subquery:
Here is another version using a
CROSS JOIN LATERAL
:Another approach is to create a grouping ID by creating a running sum of the closing events. Then in an outer SQL you can Group By and pick min() and max() dates.
This seems a little simpler at least to me.