How to get the row-wise median over several columns of a PostgreSQL table?

ludwig
May 8, 2024
212 views
0 votes
2 Answers

My question is quite similar to this, but instead of the mean, I need to find the median value over different columns of a row (in order to identify some outliers).

Suppose I have the following table:

X   Y    Z
-------------
6   3    3
5   6    NULL
4   5    6
11  7    8

What I need is:

MEDIAN
-------------
3
5 or 5.5
5
8

In case of an even number of non-NULL elements, the median could be just one of the central values or the average. I don’t care so much about that, any version would be fine for me.

Ideally, I would like to define this solution as a PostgreSQL-function, so that I can simply use it like GREATEST/LEAST, but any idea would be appreciated.

Tags: postgresql sql

Answers

- MikeOrganek
- May 8, 2024 at 8:47 pm
- 0 votes
0
Flipping it to jsonb and then back so as to use the percentile_cont() function is a lazy way to do this:
```
select t.*, percentile_cont(0.5) within group (order by e.v::int)
  from mytest t
       cross join lateral jsonb_each_text(to_jsonb(t)) e(k,v)
 group by t.x, t.y, t.z;
```
Working fiddle
Login or Signup to reply.

As a function, using an array and sorting it with the extension intarray, it could be done like this:

create extension intarray; /*to be able to use sort()*/

create or replace function median() returns setof decimal as $$ 
declare 
    r record;   
begin
    for r in    
        select (sort(array_remove(array[x,y,z], null))) array_sorted, 
        array_length(array_remove(array[x,y,z], null), 1) as len 
        from stats order by id
    loop 
        if mod(r.len, 2) != 0 then
            return next r.array_sorted[(r.len+1)/2];
        else 
            return next (r.array_sorted[(r.len+1)/2] + r.array_sorted[(r.len+1)/2+1]) / 2::decimal;
        end if;
    end loop;   
end; 
$$ language plpgsql;

Fiddle to test here

Please signup or login to give your own answer.

Click here to cancel reply.