skip to Main Content

My question is quite similar to this, but instead of the mean, I need to find the median value over different columns of a row (in order to identify some outliers).

Suppose I have the following table:

X   Y    Z
-------------
6   3    3
5   6    NULL
4   5    6
11  7    8

What I need is:

MEDIAN
-------------
3
5 or 5.5
5
8

In case of an even number of non-NULL elements, the median could be just one of the central values or the average. I don’t care so much about that, any version would be fine for me.

Ideally, I would like to define this solution as a PostgreSQL-function, so that I can simply use it like GREATEST/LEAST, but any idea would be appreciated.

2

Answers


  1. Flipping it to jsonb and then back so as to use the percentile_cont() function is a lazy way to do this:

    select t.*, percentile_cont(0.5) within group (order by e.v::int)
      from mytest t
           cross join lateral jsonb_each_text(to_jsonb(t)) e(k,v)
     group by t.x, t.y, t.z;
    

    Working fiddle

    Login or Signup to reply.
  2. As a function, using an array and sorting it with the extension intarray, it could be done like this:

    create extension intarray; /*to be able to use sort()*/
    
    create or replace function median() returns setof decimal as $$ 
    declare 
        r record;   
    begin
        for r in    
            select (sort(array_remove(array[x,y,z], null))) array_sorted, 
            array_length(array_remove(array[x,y,z], null), 1) as len 
            from stats order by id
        loop 
            if mod(r.len, 2) != 0 then
                return next r.array_sorted[(r.len+1)/2];
            else 
                return next (r.array_sorted[(r.len+1)/2] + r.array_sorted[(r.len+1)/2+1]) / 2::decimal;
            end if;
        end loop;   
    end; 
    $$ language plpgsql;
    

    Fiddle to test here

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search