I’m trying fill NULL
values in multiple columns (different column types INT, VARCHAR) with previous NOT NULL
value in a group ordered by date. Considering following table:
CREATE TABLE IF NOT EXISTS test (
id VARCHAR,
date DATE,
value_1 INT,
value_2 VARCHAR
);
INSERT INTO test VALUES
(1, '2022-01-04', 5, 'asdf'),
(1, '2022-01-03', NULL, NULL),
(1, '2022-01-02', NULL, 'def'),
(1, '2022-01-01', 4, NULL),
(2, '2022-01-04', 1, 'a'),
(2, '2022-01-03', NULL, NULL),
(2, '2022-01-02', 2, 'b'),
(2, '2022-01-01', NULL, NULL);
I want to get here (please consider value_1 –> INTEGER, value_2 string):
4
Answers
One day, PostgreSQL may support the
IGNORE NULLS
option forLEAD
andLAG
functions.In the mean time, you must use window functions to build groups, then select the maximum in each group.
Explanation:
COUNT(value_1) OVER (PARTITION BY id ORDER BY Date ASC/DESC)
is a cumulative count that only increases whenvalue_1
is not null. The result is, for each consecutive value it takes (1
,2
,3
), there will be 1 record wherevalue_1
is not null, and 0, 1 or several records where the field is null.Example:
ORDER BY Date
specified in the window);value_1 is null
=>count
is 0.value_1 is not null
=>count
is 1.value_1 is null
=>count
stays at 1.value_1 is null
=>count
stays at 1.value_1 is not null
=>count
is 2.… and so on.
Using this pattern, the
MAX(value_1) OVER (PARTITION BY id, grp_1)
works on this partition of record by taking the non-null value out of each group.[count = 0]
has only 1 record withnull
=> Themax
is null.[count = 1]
has 1 non null and 2 null values => Themax
is the non null value.[count = 2]
has 1 non null value (and an unspecified number of null values) => Themax
is the non null value.The pattern keeps going; for every value of
count
(except0
), themax
window function always has exactly 1 non null value to pick.Edit:
The above query is the answer to the original version of the question. Now that dates have been changed to be in the exact opposite order as before, the 3 occurrences of
ORDER BY date
must all be inverted to match. Basically, 2 negatives make a positive.PostgreSQL
distinct on
can help whenever your spec says you want one, top record in a group. You can use it in a scalar subquery in an update: online demoSince all it does is go to a row with a null and from its context, point back at a row you want to borrow the value from, it’s fairly intuitive.
Using windowed function:
ARRAY_AGG
building array with defined orderARRAY_REMOVE
removing NULLs(array)[-1]
Query version readable:
Debug (array_is_build, it is a matter of taking last item)
Single query (lack of [-1] and duplicating the code makes it "ugly"
Output:
db<>fiddle demo
Short
window-functions
alternative: online demoThe principle remains the same as in Lukasz’ and Atmo’s answers: it constructs a frame looking back at
value_N
s from previous dates for the same ID, gets rid ofNULL
‘s then picks the latest non-NULL
.It’s just shortened and cleaned up by having:
The window defined once and re-used.
An aggregate filter, to skip
NULL
s instead of collecting and having to remove them.A
jsonb
array. Unlike a regular array it allows for negative subscripting, eliminating the need for a subquery, a CTE or duplicating the code in order to callarray_upper()
and get the last element. That’s at the price of having to cast non-text
types.Negative index is equal to positive on a reversed array, so the same can be done with a regular array as long as its constructed in reverse, by inverting the frame clause and order: demo