skip to Main Content

I have a table which has id and group_id foreign key.

I want to create (N=2) buckets of equal size (approximately), such that all rows with given group_id end up in the same bucket.

For example, given:

id, group_id
1,1
2,1
3,2
4,NULL
5,1
6,2

I get:

id, group_id, bucket_id
1,1,1
2,1,1
3,2,2
4,NULL,2
5,1,1
6,2,2

Is there an easy way to achieve this that I’m missing?

2

Answers


  1. you should be able to use the NTILE analytic function

    Login or Signup to reply.
  2. This will do a roughly decent job for two buckets, since the rank will be the same for all like group_ids and you just set the cutoff point midway.

    Of course this is brittle, if you have a case with 10 records with {group id: count} like {1: 9, 2: 1} it will still be very skewed.

    fiddle: https://dbfiddle.uk/lbj5sXgo

    create table some_test_data 
      (
        id integer,
        group_id integer
      );
    
    insert into some_test_data values
      (1,1),
      (2,1),
      (3,3),
      (4,3),
      (5,4),
      (6,5),
      (7,5),
      (8,5),
      (9,6),
      (10,7);
    
    select *,
           case when rank() over ( order by group_id ) < count(1) over ()/2 then 1 
                else 2
            end as bucket
      from some_test_data;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search