I have a table tbl
in Postgres with 50 million rows. The tbl
has an index on column_1
and there are a lot of queries to this table like
select * from tbl
where column_1 = 'value'
Each query returns 0-30 rows, 10 on avarage.
Once a day I completely update data in table. The query is like
delete from tbl;
insert into tbl
select * from tbl_2;
commit;
The challenge I face is the query runs too long: about 2-3 hours. It’s probably because of index. Is there a way to speed up the data update and allow user to query tbl
while it’s being updated.
If this is important – the update process is run in python Airflow and the queries come from python web app.
2
Answers
Build a copy of the table:
Then, create all required indexes on that table. Once you are done, switch tables:
That will be fast.
If nothing references this table, Laurenz’ Indiana Jones-style golden idol swap should be enough
This also transfers the index definitions from the previous version of the table.
If you are superuser and in mood for hacky solutions, you can disable the indexes for the time of your insert. This might be useful if the table is referenced by other objects because you keep the original table, whereas the swap loses the previous one and replaces it with a new one, so all references break and need to be re-established:
As pointed out by @JonSG, you might want to consider a materialized view that you’d just refresh, or even pg_ivm.