Postgresql - Fill a column with repeated thresholds

norok2
March 15, 2024
198 views
0 votes
2 Answers

Assume I have two tables a and b, each containing a series of dates:

Table a:

day_date
2020-1-1
2020-1-2
2020-1-3
2020-1-4
2020-1-5
2020-1-6
2020-1-7
2020-1-8
2020-1-9
2020-1-10

and table b:

id	some_date
0	2020-1-3
0	2020-1-6
0	2020-1-8
1	2020-1-2
1	2020-1-5

I want to create a new table c which contains both day_date, id and some_date, but now some_date contains only values such that the minimum value above some given threshold is used, i.e.:

day_date	id	next_date
2020-1-1	0	2020-1-3
2020-1-2	0	2020-1-3
2020-1-3	0	2020-1-6
2020-1-4	0	2020-1-6
2020-1-5	0	2020-1-6
2020-1-6	0	2020-1-8
2020-1-7	0	2020-1-8
2020-1-8	0	null
2020-1-9	0	null
2020-1-10	0	null
2020-1-1	1	2020-1-2
2020-1-2	1	2020-1-5
2020-1-3	1	2020-1-5
2020-1-4	1	2020-1-5
2020-1-5	1	null
2020-1-6	1	null
2020-1-7	1	null
2020-1-8	1	null
2020-1-9	1	null
2020-1-10	1	null

My idea was to filter a cross join of the two, e.g.:

CREATE TEMP TABLE some_next AS (
  SELECT
    day_date,
    id,
    CASE WHEN some_date > day_date THEN some_date ELSE NULL END AS next_churn_date,
    ROW_NUMBER() OVER (PARTITION BY day_date, id ORDER BY some_date ASC) AS rn_next
  FROM a CROSS JOIN b
  WHERE some_date > day_date OR day_date >= (SELECT MAX(some_date) FROM b bb WHERE bb.id = b.id
);

CREATE TABLE c AS (
  SELECT * FROM some_next WHERE rn_next = 1 ORDER BY id, day_date
);

I was looking for a simpler solution. Any ideas?

Answers

We can start with a cross join to get all combinations of IDs and days.

select
  day,
  id
from days
cross join ( select distinct id from some_days )
order by id, day;

Then use that to rank the some_days proximity to the days (dropping the order by, it’s unnecessary).

with days_ids as (
  select
    day,
    id
  from days
  cross join ( select distinct id from some_days )
)
select
  di.day as day,
  di.id,
  sd.day as some_day,
  row_number() over (
    partition by di.day, di.id
    order by sd.day asc
  ) as row_num
from days_ids di
left join some_days sd on di.day < sd.day and di.id = sd.id
order by di.id, di.day, row_num

And select only those rows with a row_number of 1.

with days_ids as (
  select
    day,
    id
  from days
  cross join ( select distinct id from some_days )
), 
matched_days as (
  select
    di.day as day,
    di.id,
    sd.day as some_day,
    row_number() over (
      partition by di.day, di.id
      order by sd.day asc
    ) as row_num
  from days_ids di
  left join some_days sd on di.day < sd.day and di.id = sd.id
)
select day, id, some_day
from matched_days
where row_num = 1
order by id, day

Demonstration.

- ErwinBrandstetter
- March 15, 2024 at 3:45 am
- 0 votes
0
A simpler solution – tested on Postgres, not Redshift (which is not Postgres at all):
```
SELECT a.day_date, b1.id
    , (SELECT b.some_date
       FROM   b
       WHERE  b.id = b1.id
       AND    b.some_date > a.day_date
       ORDER  BY b.some_date
       LIMIT  1) AS next_date
FROM   a
CROSS  JOIN (SELECT DISTINCT id FROM b) b1
ORDER  BY 2, 1;
```
fiddle

With an index on b(id, some_date) it can perform decently, as the correlated subquery results in one very fast index-only scan for each result row.
The is a (much) faster way to get distinct b.id if there are only few distinct values in a big table. See:
- SELECT DISTINCT is slower than expected on my table in PostgreSQL
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Fill a column with repeated thresholds

Answers