liquibase run query by parts - Postgresql

DmitryShabalin
April 20, 2023
137 views
0 votes
2 Answers

Problem is we have multiple databases and some of the table can have millions on rows so queries like
update my_table set column1='some'
takes up to 30 mins and more and sometimes transaction fails due connection lost or some.

The idea is some how split original query into parts in runtime and run them.

I hope liquibase have solution to do it but didnt find it.

Answers

- PJatLiquibase
- April 20, 2023 at 9:52 pm
- 0 votes
0
I do include these type of statements in my Liquibase changelogs whenever I want to ensure that an update is propagating across all my environments. So this a valid usecase in my experience.

Usually, what I do, is shard up the update by using the id column, like

update my_table set column1='some' where id<500000

update my_table set column1='some' where id>=500000

You’ll have to decide what where clause makes the most sense for your data.

Login or Signup to reply.

IMO, in case of PostgreSQL the idea of "splitting original query into parts" within single connection/transaction does not make sense at all (I do not see any performance benefits there), moreover it is not clear what does the statement "sometimes transaction fails due connection lost or some" mean – if you are experiencing infrastructure issues you are supposed to resolve them via configuring/tuning infrastructure.

The liquibase way of performing such migrations is following:

I. define SQL migration as:

do
$$
    begin
        raise exception 'please perform following update manually';
    end
$$;

II. operator is supposed to perform following steps:

run liquibase update and get error
perform manual update
run liquibase mark-next-changeset-ran
run liquibase update again

BTW, if we are talking about theoretical possibilities of getting performance benefits I would think about something like:

do
$$
    declare
        connections text[];
        tile        record;
        connection  text;
    begin
        begin
            -- split id range into 10 tiles
            for tile in (with tiles as (select ntile(10) over (order by id) as part, id
                                        from tbl)
                         select part, min(id) as rmin, max(id) as rmax
                         from tiles
                         group by part
                         order by part)
                LOOP
                    connection = 'q' || tile.part;
                    connections = array_append(connections, connection);
                    PERFORM dblink_connect(connection, 'dbname=' || current_database());
                    PERFORM dblink_send_query(connection,
                                              'update tbl set column1 =''some'' WHERE id >=''' ||
                                              tile.rmin
                                                  || ''' AND id <= ''' || tile.rmax || '''');
                END LOOP;

            foreach connection in ARRAY connections
                loop
                    PERFORM * FROM dblink_get_result(connection) as t1(f1 text);
                end loop;

            PERFORM dblink_disconnect(c)
            from (select unnest(dblink_get_connections()) as c) conn
            where c = ANY (connections);

        end;

    exception
        when others then
            PERFORM dblink_disconnect(c)
            from (select unnest(dblink_get_connections()) as c) conn
            where c = ANY (connections);
            raise;

    end;
$$;

Please signup or login to give your own answer.

Click here to cancel reply.

liquibase run query by parts – Postgresql

Answers