calculate 2 cumulative sums for 2 different groups - Postgresql

x89
February 8, 2023
291 views
2 votes
4 Answers

i have a table that looks like this:

id   position    value
5    senior      10000
6    senior      20000
8    senior      30000
9    junior      5000
4    junior      7000
3    junior      10000

It is sorted by position and value (asc) already. I want to calculate the number of seniors and juniors that can fit in a budget of 50,000 such that preference is given to seniors.

So for example, here 2 seniors (first and second) + 3 juniors can fit in the budget of 50,000.

id   position    value     cum_sum
5    senior      10000     10000
6    senior      20000     30000
8    senior      30000     60000   ----not possible because it is more than 50000
-----------------------------------   --- so out of 50k, 30k is used for 2 seniors.
9    junior      5000      5000 
4    junior      7000      12000
1    junior      7000      19000 ---with the remaining 20k, these 3 juniors can also fit
3    junior      10000     29000

so the output should look like this:

juniors    seniors
3          2

how can i achieve this in sql?

Answers

This example of using a running total:

select 
count(case when chek_sum_jun > 0 and position = 'junior'  then position else null end) chek_jun,
count(case when chek_sum_sen > 0 and position = 'senior' then  position else null end) chek_sen
    from (
    select position, 
    20000 - sum(case when position = 'junior' then value else 0 end) over (partition by position order by value asc rows between unbounded preceding and current row )  chek_sum_jun,
    50000 - sum(case when position = 'senior' then value else 0 end) over (partition by position order by value asc rows between unbounded preceding and current row )  chek_sum_sen
    from test_table) x

demo : https://dbfiddle.uk/ZgOoSzF0

postgresql supports window SUM(col) OVER()

with cte as (
  SELECT *, SUM(value) OVER(PARTITION BY position ORDER BY id) AS cumulative_sum
  FROM mytable
)
select position, count(1)
from cte
where cumulative_sum < 50000
group by position

An other way to do it to get results in one row :

with cte as (
  SELECT *, SUM(value) OVER(PARTITION BY position ORDER BY id) AS cumulative_sum
  FROM mytable
),
cte2 as (
  select position, count(1) as _count
  from cte
  where cumulative_sum < 50000
  group by position
)
select
sum(case when position = 'junior' then _count else null end) juniors,
sum(case when position = 'senior' then _count else null end) seniors
from cte2

Demo here

- Zegarek
- February 8, 2023 at 10:13 pm
- 0 votes
0
For it to work, the sum has to be not only cumulative, but also selective. As mentioned in the comment, you can achieve that with a recursive cte: online demo
```
with recursive 
 ordered as --this will be fed to the actual recursive cte
(   select *,
           row_number() over (order by position desc,value asc) 
    from test_table)
,recursive_cte as 
( select id,
         position,
         value, 
         value*(value<50000)::int as cum_sum,
         value<50000 as is_hired,
         2 as next_i
  from ordered
  where row_number=1
  union
  select o.id,
         o.position,
         o.value, 
         case when o.value+r.cum_sum<50000 then o.value+r.cum_sum else r.cum_sum end,
         (o.value+r.cum_sum)<50000 as is_hired,
         r.next_i+1 as next_i
  from recursive_cte r, 
       ordered o
  where o.row_number=next_i
)
select count(*) filter (where position='junior') as juniors,
       count(*) filter (where position='senior') as seniors
from recursive_cte 
where is_hired;
```
- row_number() over () is a window function
- count(*) filter (where...) is an aggregate filter. It’s a faster variant of the sum(case when expr then a else 0 end) or count(nullif(expr)) approach, for when you only wish to sum a specific subset of values. That’s just to put those in columns as you did in your expected result, but it could be done with a select position, count(*) from recursive_cte where is_hired group by position, stacked.
All it does is order your list according to your priorities in the first cte, then go through it row by row in the second one, collecting the cumulative sum, based on whether it’s still below your limit/budget.
Login or Signup to reply.

- JohnLBevan
- February 8, 2023 at 10:21 pm
- 0 votes
0
Here’s one possible solution: DB Fiddle
```
with seniorsCte as (
  select id, position, value, total
  from budget b
  inner join (
    select id, position, value, (sum(value) over (order by value, id)) total
    from people
    where position = 'senior'
  ) as s 
  on s.total <= b.amount
)
, juniorsCte as (
  select j.id, j.position, j.value, j.total + r.seniorsTotal
  from (
    select  coalesce(max(total), 0) seniorsTotal
    , max(b.amount) - coalesce(max(total), 0) remainingAmount
    from budget b
    cross join seniorsCte
  ) as r
  inner join (
    select id, position, value, (sum(value) over (order by value, id)) total
    from people
    where position = 'junior'
  ) as j
  on j.total <= r.remainingAmount
)
/* use this if you want the specific records
select *
from seniorsCte 
union all
select *
from juniorsCte
*/
select (select count(1) from seniorsCte) seniors
, (select count(1) from juniorsCte) juniors
```
From your question I suspect you’re familiar with window functions; but in case not; the below query pulls back all rows from the people table where the position is senior, and creates a column, total which is our cumulative total of the value of the rows returned, starting with the lowest value, ascending (then sorting by id to ensure consistent behaviour if there’s multiple rows with the same value; though that’s not strictly required if we’re happy to get those in an arbitrary order).
```
select id, position, value, (sum(value) over (order by value, id)) total
from people
where position = 'senior'
```
The budget table I just use to hold a single row/value saying what our cutoff is; i.e. this avoids hardcoding the 50k value you mentioned, so we can easily amend it as required.

The common table expressions (CTEs) I’ve used to allow us to filter our juniors subquery based on the output of our seniors subquery (i.e. as we only want those juniors up to the difference between the budget and the senior’s total), whilst allowing us to return the results of juniors and seniors independently (i.e. if we wanted to return the actual rows, rather than just totals, this allows us to perform a union all between the two sets; as demonstrated in the commented out code.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

calculate 2 cumulative sums for 2 different groups – Postgresql

Answers