skip to Main Content

To my amazement, I noticed that simple text search APIs don’t utilize the indexes on tables that are part of an RLS policy.For example, mytable has a policy on it, and I’m using a column called text_vector to search on:

SELECT inst_id, text_vector, ts_rank(text_vector, q) AS rank, q
  FROM mytable, plainto_tsquery('apple & orange') q
 WHERE text_vector @@ q

This has to do with them not being LEAKPROOF, I’m guessing. Is there any kind of workaround without altering the system tables (which I can’t do in RDS, for example)?

To reproduce, I performed the following:

CREATE TABLE mytable (inst_id INTEGER, text_content TEXT);

ALTER TABLE mytable 
ADD COLUMN text_vector TSVECTOR 
GENERATED ALWAYS AS (to_tsvector('english', text_content)) STORED;

CREATE INDEX idx_text_vector ON mytable USING GIN (text_vector);

INSERT INTO mytable (inst_id, text_content) VALUES (1, 'An apple a day is good for you.');
INSERT INTO mytable (inst_id, text_content) VALUES (2, 'My favorite color is orange.');
INSERT INTO mytable (inst_id, text_content) VALUES (3, 'You can have an orange or apple, not both.');

(inserted 10K more rows….)

EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT inst_id, text_vector, ts_rank(text_vector, q) AS rank, q
FROM mytable, plainto_tsquery('apple & orange') q
WHERE text_vector @@ q

Nested Loop  (cost=17.03..264.54 rows=100 width=159) (actual time=0.037..0.038 rows=1 loops=1)
  Output: mytable.inst_id, mytable.text_vector, ts_rank(mytable.text_vector, q.q), q.q
  Buffers: shared hit=8
  ->  Function Scan on pg_catalog.plainto_tsquery q  (cost=0.25..0.26 rows=1 width=32) (actual time=0.019..0.020 rows=1 loops=1)
        Output: q.q
        Function Call: plainto_tsquery('apple & orange'::text)
  ->  Bitmap Heap Scan on myschema.mytable  (cost=16.78..263.03 rows=100 width=123) (actual time=0.013..0.013 rows=1 loops=1)
        Output: mytable.inst_id, mytable.text_content, mytable.text_vector
        Recheck Cond: (mytable.text_vector @@ q.q)
        Heap Blocks: exact=1
        Buffers: shared hit=8
        ->  Bitmap Index Scan on idx_text_vector  (cost=0.00..16.75 rows=100 width=0) (actual time=0.010..0.010 rows=1 loops=1)
              Index Cond: (mytable.text_vector @@ q.q)
              Buffers: shared hit=7
Planning:
  Buffers: shared hit=1
Planning Time: 0.065 ms
Execution Time: 0.065 ms

(created a policy invoking a function has_access() on mytable)

CREATE OR REPLACE FUNCTION has_access(p_inst_id BIGINT) RETURNS BOOLEAN
LANGUAGE 'plpgsql' SECURITY DEFINER
AS $has_access$
DECLARE
  v_has_access BOOLEAN;
BEGIN
  v_has_access := (SELECT TRUE WHERE p_inst_id < 4);
  RETURN v_has_access;
END;
$has_access$;

CREATE POLICY pol_has_access 
ON mytable 
FOR SELECT TO PUBLIC USING (has_access(inst_id));
ALTER TABLE mytable ENABLE ROW LEVEL SECURITY;

(logged into the system as a non-owner with SELECT privs on mytable)

EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT inst_id, text_vector, ts_rank(text_vector, q) AS rank, q
FROM mytable, plainto_tsquery('apple & orange') q
WHERE text_vector @@ q

Nested Loop  (cost=0.25..5816.14 rows=33 width=159) (actual time=0.053..39.206 rows=1 loops=1)
  Output: mytable.inst_id, mytable.text_vector, ts_rank(mytable.text_vector, q.q), q.q
  Join Filter: (mytable.text_vector @@ q.q)
  Rows Removed by Join Filter: 2
  Buffers: shared hit=479
  ->  Function Scan on pg_catalog.plainto_tsquery q  (cost=0.25..0.26 rows=1 width=32) (actual time=0.018..0.020 rows=1 loops=1)
        Output: q.q
        Function Call: plainto_tsquery('apple & orange'::text)
  ->  Seq Scan on myschema.mytable  (cost=0.00..5732.41 rows=6671 width=123) (actual time=0.026..39.181 rows=3 loops=1)
        Output: mytable.inst_id, mytable.text_content, mytable.text_vector
        Filter: has_access((mytable.inst_id)::bigint)
        Rows Removed by Filter: 20010
        Buffers: shared hit=479
Planning Time: 0.048 ms
Execution Time: 39.220 ms

Note that the index is being used as owner, but when I perform the same query as a non-owner, the RLS policy prevents the index from being used.

2

Answers


  1. Yes, that must be caused by RLS. PostgreSQL must prevent that a non-LEAKPROOF functions is called before the condition imposed by the row-level security policy. Now the @@ operator is not leakproof. Consequently, that operator will never be used in an index scan condition, even if the RLS policy could be satisfied by the same index scan (e.g., a multi-column GiST index).

    The only remedy is to exempt the user from RLS on that table. Sorry, security is not for free.

    You might wonder why a harmless operator like @@ is not leakproof. PostgreSQL takes a very strict policy here: a function will be marked LEAKPROOF only if you can prove that no possible argument that you can supply can trigger an error that could allow you to guess any information about the other arguments. For example, + for integer is not leakproof, because an overflow error would leak information about the other argument.

    Login or Signup to reply.
  2. I did a small POC and added the has_access() as a condition to the index:

    CREATE INDEX idx_text_vector_has_access ON mytable USING GIN (text_vector) 
        WHERE has_access(inst_id);
    

    This works fine for me, this index is now used.

    Give it a try!

    By the way, the function has_access() has to be immutable.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search