skip to Main Content

I am trying to implement a Postgres function with optional parameters. I tried a stable PL/pgSQL function with required parameters, splitting cases along user input in an IF construct.
Here is a simplified example:

Setup:

CREATE TABLE IF NOT EXISTS test_tab1(
    field1 int PRIMARY KEY,
    field2 int
);
CREATE TABLE IF NOT EXISTS test_tab2(
    field1 int PRIMARY KEY,
    field2 int
);
INSERT INTO test_tab1 SELECT FLOOR(RANDOM()*10000), FLOOR(RANDOM()*10000) FROM GENERATE_SERIES(1,10);
INSERT INTO test_tab2 SELECT * FROM test_tab1;
    
CREATE OR REPLACE FUNCTION test_tab1_func()
    RETURNS SETOF test_tab1
    LANGUAGE sql STABLE 
    AS $$
        SELECT * FROM test_tab1;
$$;

CREATE OR REPLACE FUNCTION test_tab2_func()
    RETURNS SETOF test_tab1
    LANGUAGE sql STABLE 
    AS $$
        SELECT * FROM test_tab2;
$$;

The function that is the subject of my question:

CREATE OR REPLACE FUNCTION test_func(foo int DEFAULT null)
  RETURNS SETOF test_tab1
  LANGUAGE plpgsql STABLE AS
$$
BEGIN
    IF (foo IS null) THEN
        RETURN QUERY
        SELECT * FROM test_tab1_func();
    ELSE
        RETURN QUERY
        SELECT * FROM test_tab2_func();
    END IF;
END;
$$;
EXPLAIN SELECT * FROM test_func();

EXPLAIN returns:

Function Scan on test_func  (cost=0.25..10.25 rows=1000 width=8)

My experiments with more complex queries show that this Function Scan estimate is not very "smart", and will return basically this same result no matter the complexity of the function body. Since the EXPLAIN output is so uninformative, I have come here for more info on the execution plan.

My question is this: Since the DB presumably can plan for test_tab1_func, and it can plan for test_tab2_func, can I count on the DB to use the plan that it would ordinarily devise for test_tab1_func for test_func when foo is null, and the plan that it would ordinarily devise for test_tab2_func for test_func otherwise? If not, is there a better way to implement this sort of "case/switch" function logic that Postgres can more readily plan around?

2

Answers


  1. Your example updated to include a difference in outcomes for the functions:

    CREATE TABLE IF NOT EXISTS test_tab1(
        field1 serial PRIMARY KEY,
        field2 int
    );
    CREATE TABLE IF NOT EXISTS test_tab2(
        field1 serial PRIMARY KEY,
        field2 int
    );
    
    INSERT INTO test_tab1(field2) SELECT  FLOOR(RANDOM()*100000000) FROM GENERATE_SERIES(1,1000000);
    INSERT INTO test_tab2 SELECT * FROM test_tab1;
    
    CREATE OR REPLACE FUNCTION test_tab1_func()
        RETURNS SETOF test_tab1
        LANGUAGE sql STABLE 
        AS $$
            SELECT * FROM test_tab1 where field1 = 11;
        $$;
    
    
    CREATE OR REPLACE FUNCTION test_tab2_func()
        RETURNS SETOF test_tab1
        LANGUAGE sql STABLE 
        AS $$
            SELECT * FROM test_tab2 where field1 < 100000;
        $$;
    
    CREATE OR REPLACE FUNCTION test_func(foo int DEFAULT null)
        RETURNS SETOF test_tab1
        LANGUAGE plpgsql STABLE 
        AS $$
            BEGIN
                IF (foo IS null) THEN
                    RETURN QUERY
                    SELECT * FROM test_tab1_func();
                ELSE
                    RETURN QUERY
                    SELECT * FROM test_tab2_func();
                END IF;
            END;
    $$;
    
    EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test_func();
                                                        QUERY PLAN                                                     
    -------------------------------------------------------------------------------------------------------------------
     Function Scan on public.test_func  (cost=0.25..10.25 rows=1000 width=8) (actual time=0.375..0.376 rows=1 loops=1)
       Output: field1, field2
       Function Call: test_func(NULL::integer)
     Planning Time: 0.043 ms
     Execution Time: 0.957 ms
    
    EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test_func(1);
                                                            QUERY PLAN                                                         
    ---------------------------------------------------------------------------------------------------------------------------
     Function Scan on public.test_func  (cost=0.25..10.25 rows=1000 width=8) (actual time=99.505..159.962 rows=999999 loops=1)
       Output: field1, field2
       Function Call: test_func(1)
     Planning Time: 0.016 ms
     Execution Time: 185.767 ms
    
    
    

    Adding VERBOSE and ANALYZE shows that Postgres differentiates between the two functions and tracks what running each function on it’s own does:

    EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test_tab1_func();
                                                               QUERY PLAN                                                            
    ---------------------------------------------------------------------------------------------------------------------------------
     Index Scan using test_tab1_pkey on public.test_tab1  (cost=0.42..8.44 rows=1 width=8) (actual time=0.006..0.007 rows=1 loops=1)
       Output: test_tab1.field1, test_tab1.field2
       Index Cond: (test_tab1.field1 = 11)
     Planning Time: 0.055 ms
     Execution Time: 0.014 ms
    (5 rows)
    
    EXPLAIN (VERBOSE, ANALYZE) SELECT * FROM test_tab2_func();
                                                           QUERY PLAN                                                        
    -------------------------------------------------------------------------------------------------------------------------
     Seq Scan on public.test_tab2  (cost=0.00..16925.00 rows=999999 width=8) (actual time=0.005..74.222 rows=999999 loops=1)
       Output: test_tab2.field1, test_tab2.field2
       Filter: (test_tab2.field1 < 1000000)
       Rows Removed by Filter: 1
     Planning Time: 0.070 ms
     Execution Time: 96.227 ms
    
    Login or Signup to reply.
  2. Can I count on a simple case/switch plpgsql function to use a sane query plan?

    Yes, you can always count on "sane" query plans – in plain SQL or nested in any function. It’s always the same query planner, just called from a different context. An "insane" query plan would be cause for a bug report to the PostgreSQL Bug List <[email protected]> – after making sure the insanity wasn’t all yours.

    No, you cannot count on a sane query plan, since every nested SQL statement in a function gets its own plan – unless the function is inlined as a whole, which is only possible for SQL functions (LANGUAGE sql) that fulfill a couple of conditions. So not applicable to PL/pgSQL functions.

    Your call results in one plan for the calling SELECT * FROM test_func(); and another, separate (nested) plan for SELECT * FROM test_tab1_func();.
    SELECT * FROM test_tab2_func(); in the other leg of the IF construct is not planned at all until control actually reaches it.

    To inspect every detail of what’s happening behind the curtains, use auto_explain. It’s simple, but you must be superuser. See:

    Related:

    Your example

    Splitting into distinct queries along user input can make sense if that results in a more efficient query plan for at least one branch. But that’s not the case for your simple example, where the added overhead far outweighs the possible gain.

    Your function test_func() and call can be replaced with plain SQL:

    SELECT * FROM test_tab1_func() WHERE $foo IS NULL
    UNION ALL
    SELECT * FROM test_tab2_func() WHERE $foo IS NOT NULL;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search