skip to Main Content

My task:
The variables x and def are a counter and a boolean respectively. Both start at zero.

  1. If at any point in time x > 2 then def = 1 from that point onward.
  2. Now if def = 1 then in order for it to become zero again, x has to equal zero for 3 consecutive periods otherwise def remains 1.

I have a background in sas and am now starting to work more extensively with Postgresql:

data have;
input n x;
datalines;
1 0
2 1
3 2
4 3
5 4
6 2
7 1
8 0
9 1
10 0
11 0
12 0
13 0
14 1
15 2
16 3
17 4
;
run;


data want(drop=prev_: cur_m def rename=def_new=def);
set have;
retain  prev_def prev_cur_m;
if x > 2 then def = 1;
else if prev_def = 1 and x > 0 then def = 1;
else def = 0;

if _n_ = 1 then cur_m = -1;
else if def = 1 then cur_m = -1;
else if prev_def = 1 and def = 0 then cur_m = 0;
else if -1 < prev_cur_m < 3 and x > 0 then cur_m = 0;
else if -1 < prev_cur_m < 3 and x = 0 then cur_m + 1;
else cur_m = -1;

if _n_ = 1 then def_new = 0;
else if def = 1 then def_new = 1;
else if -1 < cur_m < 3 then def_new = 1;
else def_new = 0;

output;
prev_def = def;
prev_cur_m = cur_m;
run;

As far as I understood this can be achieved with cursors and attempted to solve the problem using this post. However, I got stuck on the step when I needed to use the values from the previous row in order to calculate the current ones. Here is the code:

create table have (
    n int,
    x int
) ;
insert into have (n,x)
values (1,0),(2,1),(3,2),(4,3),(5,4),(6,2),
       (7,1),(8,0),(9,1),(10,0),(11,0),(12,0),
       (13,0),(14,1),(15,2),(16,3),(17,4);
create or replace function calc_dpd()
returns table (
    n int,
    x int,
    def int,
    cure_m int,
    def_new int )
language plpgsql as $f$
declare rec record;
begin
    for rec in (select * from have order by n) loop 
        if rec.n = 1 then 
            def = 0;
            cure_m = 0;
            def_new = 0;
        end if;
        if x > 2 then 
            def = 1;
        end if;
--  here I need to test if def from previous row = 1 and current def = 0;
        select rec.n, rec.x
        into n,x;
        return next;
    end loop;
end $f$;

2

Answers


  1. PL/pgSQL can do it but you don’t need its complexity and overhead to achieve what you need: it’s all doable with window functions in plain, declarative SQL: demo

    1. 0=all(array[]) checks if all elements in the array are 0
    2. lag(x,that_many_rows_back,0) gets the value x from that many rows back. Or 0 if there aren’t that many rows prior to this one.
    3. You can also use every() or bool_and() as a window function, and combine that with a frame start clause so that the window frame only reaches 2 rows back.
    4. You can define the window in place, right in over() clause, or separately in window clause. The latter is preferable if it’s non-trivial or you need to re-use it.
    5. bool_or() checks if any of the values up to this one met the criteria. Here it’s limited to a window frame, so it only concerns the rows since the last time def switched off after getting 3 zeroes in x.
    6. The boolean::int cast maps false to 0, true to 1.
    with cte as (
        select *
          ,(x > 2) exceeded_2
          ,(every(x=0)over w1) as should_switch
        from have
        window w1 as (order by n rows 2 preceding) )
    ,cte2 as (
        select *
          ,sum(should_switch::int)over(order by n) def_on_period 
        from cte
    )
    select n,x,(bool_or(exceeded_2) over w2)::int as def
    from cte2
    window w2 as (partition by def_on_period 
                  order by n);
    
    n x def
    1 0 0
    2 1 0
    3 2 0
    4 3 1 first x>2
    5 4 1
    6 2 1
    7 1 1
    8 0 1
    9 1 1
    10 0 1
    11 0 1
    12 0 0 third zero in x
    13 0 0
    14 1 0
    15 2 0
    16 3 1 x>2
    17 4 1
    Login or Signup to reply.
  2. That’s one of the ways you could do the same with PL/pgSQL: demo

    1. The cure_m and def_new are gone – the spec only described behaviour for n, x, def.
    2. This keeps your loop over have, retaining 2 previous x values in prev_x and prev_x2.
    3. If..elsif only updates def in the two situations you described.
    4. 0=all(array[a,b,c]) checks if all 3 values are zero, which is used to zero-out def.
    create or replace function calc_dpd()
    returns table (
        n int,
        x int,
        def int )
    language plpgsql as $f$
    declare rec record;
            prev_x  int:=0;
            prev_x2 int:=0;
    begin
        def = 0;
        for rec in (select * from have order by n) loop 
            if rec.x > 2 then 
                def := 1;
            elsif 0=all(array[rec.x, prev_x, prev_x2]) then
                def := 0;
            end if;
            return query select rec.n, rec.x, def;
            prev_x2 := prev_x;
            prev_x  := rec.x;
        end loop;
    end $f$;
    
    select * from calc_dpd();
    
    n x def
    1 0 0
    2 1 0
    3 2 0
    4 3 1 first x>2
    5 4 1
    6 2 1
    7 1 1
    8 0 1
    9 1 1
    10 0 1
    11 0 1
    12 0 0 third zero in x
    13 0 0
    14 1 0
    15 2 0
    16 3 1 x>2
    17 4 1
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search