skip to Main Content

I have a table tbl in Postgres with 50 million rows. The tbl has an index on column_1 and there are a lot of queries to this table like

select * from tbl
where column_1 = 'value'

Each query returns 0-30 rows, 10 on avarage.

Once a day I completely update data in table. The query is like

delete from tbl;
insert into tbl
select * from tbl_2;
commit;

The challenge I face is the query runs too long: about 2-3 hours. It’s probably because of index. Is there a way to speed up the data update and allow user to query tbl while it’s being updated.
If this is important – the update process is run in python Airflow and the queries come from python web app.

2

Answers


  1. Build a copy of the table:

    CREATE TABLE tbl_new (LIKE tbl);
    
    INSERT INTO tbl_new
    SELECT * FROM tbl_2;
    

    Then, create all required indexes on that table. Once you are done, switch tables:

    BEGIN;
    
    DROP TABLE tbl;
    
    ALTER TABLE tbl_new RENAME TO tbl;
    
    COMMIT;
    

    That will be fast.

    Login or Signup to reply.
  2. If nothing references this table, Laurenz’ Indiana Jones-style golden idol swap should be enough

    create table tbl_tmp (like tbl); 
    insert into tbl_tmp table tbl;
    create temp table index_definitions as select indexdef 
      from pg_indexes where schemaname='public' and tablename='tbl';
    begin; drop table tbl; alter table tbl_tmp rename to tbl; commit;
    
    do $p$ declare indexdef_ text; 
    begin for indexdef_ in select indexdef from index_definitions loop
              execute indexdef_;
          end loop;
          drop table index_definitions;
    end $p$;
    

    This also transfers the index definitions from the previous version of the table.

    If you are superuser and in mood for hacky solutions, you can disable the indexes for the time of your insert. This might be useful if the table is referenced by other objects because you keep the original table, whereas the swap loses the previous one and replaces it with a new one, so all references break and need to be re-established:

    begin work;
    
    update pg_index set indisvalid=false, indisready=false, indislive=false
    where indrelid = 'tbl'::regclass;
    
    delete from tbl;
    insert into tbl table tbl_2;
    
    update pg_index set indisvalid=true, indisready=true, indislive=true
    where indrelid = 'tbl'::regclass;
    reindex table tbl; 
    commit work;
    

    As pointed out by @JonSG, you might want to consider a materialized view that you’d just refresh, or even pg_ivm.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search