Simplified scenario:
I want to update one table A (10 million rows) with a value from another table B (20 mil rows). The two tables are linked with ids.
It took more than 7hrs to update the whole thing in one go. (I don’t know exactly how long as I stopped the script in the middle.)
So my idea is to update table A in batches using OFFSET
and LIMIT
clauses. So far with no luck.
Wrapped up in a procedure, the code looks like below:
DECLARE
offset_number integer := 0;
batch_size integer := 1000;
BEGIN
LOOP
UPDATE TableA temp1
SET TableA_column_value_to_be_updated = (
SELECT
tableB_column_value
FROM
TableB temp2
WHERE
temp2.id = temp1.id
AND some_other_conditions_in_TableB
)
WHERE
some_other_conditions_in_Table
OFFSET offset_number
LIMIT batch_size ;
COMMIT;
offset_number := offset_number + batch_size;
EXIT WHEN NOT FOUND;
END LOOP;
END;
The engine reports an error with exception:
org.jkiss.dbeaver.model.sql.DBSQLException:
SQL Error [42601]: ERROR: syntax error at or near "OFFSET"
I have no idea what it is. Notably, it seems to work without OFFSET
and LIMIT
.
Any ideas why this would happen? Should I use other loop statement?
2
Answers
LIMIT
andOFFSET
are not in the syntax of an SQLUPDATE
statement. You needSELECT
for that.Also,
OFFSET
scales poorly to "paginate" a big table. Remember the upper bound from the last iteration instead.Something like this could work:
Use a
SELECT
statement instead to apply yourLIMIT
. To avoid race conditions with concurrent writes, throw in a locking clause (FOR UPDATE
). You may or may not need that.You might be able to i
UPDATE
directly and just increment lower & upper bound for the filter onid
instead, which is cheaper. Depends on the details of your setup and requirements. Each has its caveats.See:
The issue here is in using OFFSET and LIMIT within the UPDATE statement. Most SQL dialects (like PostgreSQL) do not support using OFFSET and LIMIT in an UPDATE query as they are usually used in SELECT statements.
Here’s an alternative solution: