Postgresql - Update with limit and offset

UpLevel
January 7, 2025
150 views
1 vote
2 Answers

Simplified scenario:

I want to update one table A (10 million rows) with a value from another table B (20 mil rows). The two tables are linked with ids.

It took more than 7hrs to update the whole thing in one go. (I don’t know exactly how long as I stopped the script in the middle.)
So my idea is to update table A in batches using OFFSET and LIMIT clauses. So far with no luck.

Wrapped up in a procedure, the code looks like below:

DECLARE 
    offset_number integer := 0;
    batch_size integer := 1000; 

BEGIN
    LOOP 
        UPDATE TableA temp1
        SET TableA_column_value_to_be_updated = (
            SELECT 
                tableB_column_value
            FROM 
                TableB temp2
            WHERE 
                temp2.id = temp1.id 
                AND some_other_conditions_in_TableB    
            ) 
        WHERE    
             some_other_conditions_in_Table
        OFFSET offset_number 
        LIMIT batch_size ; 
            
        COMMIT; 
    
        offset_number := offset_number + batch_size;
        EXIT WHEN NOT FOUND; 
    END LOOP;
END;

The engine reports an error with exception:

org.jkiss.dbeaver.model.sql.DBSQLException:
SQL Error [42601]: ERROR: syntax error at or near "OFFSET"

I have no idea what it is. Notably, it seems to work without OFFSET and LIMIT.

Any ideas why this would happen? Should I use other loop statement?

Answers

- ErwinBrandstetter
- January 2, 2025 at 7:45 am
- 0 votes
0
LIMIT and OFFSET are not in the syntax of an SQL UPDATE statement. You need SELECT for that.
Also, OFFSET scales poorly to "paginate" a big table. Remember the upper bound from the last iteration instead.

Something like this could work:
```
CREATE OR REPLACE PROCEDURE upd_in_batches(_batch_size int = 1000)
  LANGUAGE plpgsql AS
$proc$
DECLARE
   _id_bound int = 0;  -- or whatever?
BEGIN
   LOOP
      WITH sel AS (
         SELECT a.id  -- id = PK!
         FROM   tablea a
         WHERE  a.id > _id_bound
      -- AND    <some other conditions in Table A>
         ORDER  BY a.id
         LIMIT  _batch_size
         FOR    UPDATE
         )
      , upd AS (
         UPDATE tablea a
         SET    target_col = b.b_source_col
         FROM   sel s
         JOIN   tableb b USING (id)
         WHERE  a.id = s.id
         AND    a.target_col IS DISTINCT FROM b.b_source_col
         )
      SELECT max(id)  -- always returns a row
      FROM   sel
      INTO   _id_bound;

      IF _id_bound IS NULL THEN
          EXIT;  -- no more rows found; we're done, exit loop
      ELSE
         COMMIT;
      END IF;
   END LOOP;
END
$proc$
```
Use a SELECT statement instead to apply your LIMIT. To avoid race conditions with concurrent writes, throw in a locking clause (FOR UPDATE). You may or may not need that.

You might be able to iUPDATE directly and just increment lower & upper bound for the filter on id instead, which is cheaper. Depends on the details of your setup and requirements. Each has its caveats.

See:
Login or Signup to reply.

The issue here is in using OFFSET and LIMIT within the UPDATE statement. Most SQL dialects (like PostgreSQL) do not support using OFFSET and LIMIT in an UPDATE query as they are usually used in SELECT statements.

Here’s an alternative solution:

DECLARE 
    offset_number INTEGER := 0;
    batch_size INTEGER := 1000; 

BEGIN
    LOOP 
        -- Update rows in batches using a subquery to limit the rows processed
        WITH cte AS (
            SELECT temp1.id
            FROM TableA temp1
            WHERE some_other_conditions_in_Table
            ORDER BY temp1.id
            OFFSET offset_number
            LIMIT batch_size
        )
        UPDATE TableA temp1
        SET TableA_column_value_to_be_updated = (
            SELECT tableB_column_value
            FROM TableB temp2
            WHERE temp2.id = temp1.id AND some_other_conditions_in_TableB
        )
        WHERE temp1.id IN (SELECT id FROM cte);

        -- Commit after each batch
        COMMIT;

        -- Exit loop if no more rows are returned
        EXIT WHEN NOT FOUND;

        -- Increment the offset for the next batch
        offset_number := offset_number + batch_size;
    END LOOP;
END;

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – Update with limit and offset

Answers