skip to Main Content

I’m using PostgreSQL and I have a table with the following columns: id, distance, and length. I want to order the table by distance and create a new column called cum_length using a window function. I also want to filter the rows so that only rows until cum_length value cross a certain threshold are included in the final result.

Example of input table:

id distance length
1 10 1
2 5 2
3 8 1
4 1 3
5 3 2
6 9 2

Desired output for a threshold of 6:

id distance length cum_length
4 1 3 3
5 3 2 5
2 5 2 7

This is the SQL that I came up with:

WITH ordered_table AS (
  SELECT id,
    distance,
    length,
    SUM(length) OVER (ORDER BY distance) AS cum_length
  FROM table)
SELECT *
FROM ordered_table
WHERE cum_length <= 6

But this omits the last row of the desired result.

3

Answers


  1. Here is it based on your ordered_table query and an extra lag window function to calculate previous_is_less than the threshold value. Records where previous_is_less is true or null qualify for selection.

    with t as 
    (
     select *, 
        lag(cum_length) over (order by cum_length) < 6 as previous_is_less
     from  
     (
      SELECT id, distance, length,
         SUM(length) OVER (order BY distance) AS cum_length
      from the_table
     ) as ordered_table
    )
    select id, distance, length, cum_length
    from t 
    where coalesce(previous_is_less, true)
    order by cum_length;
    

    DB-Fiddle demo

    Login or Signup to reply.
  2. Try the following:

    WITH ordered_table AS 
    (
      SELECT id, distance, length,
        SUM(length) OVER (ORDER BY distance) AS cum_length
      FROM table_name
    )
    SELECT id, distance, length, cum_length
    FROM ordered_table
    WHERE cum_length <= COALESCE((SELECT MIN(cum_length) FROM ordered_table WHERE cum_length > 6), 6)
    

    For your sample data, this is equivalent to WHERE cum_length <= 7.

    See demo

    Login or Signup to reply.
  3. May be interesting to compare with this one on large data set, only one sorting by distance and should stop scanning data as soon as the threshold is found

    with data(id, distance, length) as (
        select 1, 10, 1 FROM DUAL UNION ALL
        select 2, 5, 2 FROM DUAL UNION ALL
        select 3, 8, 1 FROM DUAL UNION ALL
        select 4, 1, 3 FROM DUAL UNION ALL
        select 5, 3, 2 FROM DUAL UNION ALL
        select 6, 9, 2 FROM DUAL -- UNION ALL
    ),
    rdata(id, distance, length, rn) as (
        select id, distance, length, row_number() over(order by distance) as rn
        from data 
    ),
    recdata(id, distance, length, rn, cumlength) as ( 
        select id, distance, length, rn, length
        from rdata d
        where rn = 1
        
        union all
        
        select d.id, d.distance, d.length, d.rn, d.length + r.cumlength
        from recdata r
        join rdata d on d.rn = r.rn + 1 and r.cumlength <= 6 
    )
    select id, distance, length, cumlength from recdata
    ;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search