skip to Main Content

I have a table that stores basic information about files:

CREATE TABLE filez (
    id     INTEGER NOT NULL,
    name   TEXT NOT NULL,
    size   BIGINT NOT NULL
);
CREATE UNIQUE INDEX filez__id__idx ON filez USING (id);

I need to select a set of files with combined size that does not exceed some THRESHOLD. The following is a trivialized query which achieves that (it could use the NON MATERIALIZED CTE syntax, but that is not important in this case):

SELECT id, name
FROM (
        SELECT id, name,
            SUM(size) OVER w AS total_size -- accumulating size of files
        FROM filez
        WINDOW w AS (ORDER BY id ASC RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
        ORDER BY id ASC
    ) AS t
WHERE total_size <= THRESHOLD

I have a personal 😄 problem with this query: it goes over ALL records regardless whether the threshold was reached. I’d like the query to go over just the sufficient number of records (sorted by an indexed "id" field) and stop executing as soon as the threshold condition becomes false.

EDIT.

Maybe the problem of stopping a query could be generalized. If there was a function, let’s call it stop_query(BOOL):BOOL which could prematurely end query execution based on arbitrary condition, as in the following example:

SELECT f1, f2
FROM tab1
WHERE stop_query(<some condition>)

alternatively

SELECT f1, f2
FROM tab1
WHERE CASE WHEN <some condition> THEN stop_query() ELSE TRUE END

2

Answers


  1. You can put the query in a cte and add your ordering criteria in your query sum over.

    Then just select the data from this cte.

    with cte as (
        SELECT id, name, size,
            SUM(size) OVER (ORDER BY id asc) AS total_size -- accumulating size of files
        FROM filez
        ORDER BY id ASC
    )
    
    select * from cte WHERE total_size <= 10.00 -- THRESHOLD
    
    Login or Signup to reply.
  2. You could use a cursor-based function: demo

    create function filez_up_to_limit(p_limit numeric, p_cutoff_id int default 0) 
      returns table(id int,name text,size bigint)
    language plpgsql as $f$
    declare rc cursor for select * from filez 
                          where filez.id>p_cutoff_id 
                          order by filez.id;
            current_size int:=0;
    begin
     open rc;
     while current_size<p_limit loop
       fetch rc into id,name,size;
       return next;
       current_size:=current_size+size;
     end loop;
    end $f$;
    
    explain analyze verbose 
    select * from filez_up_to_limit(1e6);
    
    QUERY PLAN
    Function Scan on public.filez_up_to_limit (cost=0.25..10.25 rows=1000 width=44) (actual time=9.022..9.134 rows=1978 loops=1)
      Output: id, name, size
      Function Call: filez_up_to_limit(‘1000000’::numeric, 0)
    Planning Time: 0.037 ms
    Execution Time: 9.316 ms
    explain analyze verbose 
    select * from filez_up_to_limit(1e7);
    
    QUERY PLAN
    Function Scan on public.filez_up_to_limit (cost=0.25..10.25 rows=1000 width=44) (actual time=47.609..49.061 rows=20111 loops=1)
      Output: id, name, size
      Function Call: filez_up_to_limit(‘10000000’::numeric, 0)
    Planning Time: 0.019 ms
    Execution Time: 50.369 ms

    fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search