here is my dataset
processed_on|status |
------------+-------+
2023-01-01 |Success|
2023-01-02 |Success|
2023-01-03 |Success|
2023-01-04 |Fail |
2023-01-05 |Fail |
2023-01-06 |Success|
2023-01-07 |Fail |
2023-01-08 |Success|
2023-01-09 |Success|
expected output is
--------------------------
start_date|end_date|status
--------------------------
2023-01-01|2023-01-03|Success
2023-01-04|2023-01-05|Fail
2023-01-06|2023-01-06|Success
2023-01-07|023-01-07|Fail
2023-01-08|2023-01-09|Success
i tried window function like led and lag .but no luck..
2
Answers
You can use SQL window function to get the result you want in a few steps.
Step 1. First use window function to compare previous status and current status, if they are different then mark it as status bucket starting point.
Step 2. Then use window function to count how many status_bucket_starting_point we have reach so far, define it as status bucket number.
Step 3. At last, group by stauts bucket number and use min()/max() function to get start date and end date of the bucket,
Here is the PostgreSQL query:
This is a typical gaps-and-islands problem. Here, islands are groups of consecutive rows sharing the same status. We can identify them with the difference of row numbers, then aggregate: