skip to Main Content

Problem is we have multiple databases and some of the table can have millions on rows so queries like
update my_table set column1='some'
takes up to 30 mins and more and sometimes transaction fails due connection lost or some.

The idea is some how split original query into parts in runtime and run them.

I hope liquibase have solution to do it but didnt find it.

2

Answers


  1. I do include these type of statements in my Liquibase changelogs whenever I want to ensure that an update is propagating across all my environments. So this a valid usecase in my experience.

    Usually, what I do, is shard up the update by using the id column, like

    update my_table set column1='some' where id<500000

    update my_table set column1='some' where id>=500000

    You’ll have to decide what where clause makes the most sense for your data.

    Login or Signup to reply.
  2. IMO, in case of PostgreSQL the idea of "splitting original query into parts" within single connection/transaction does not make sense at all (I do not see any performance benefits there), moreover it is not clear what does the statement "sometimes transaction fails due connection lost or some" mean – if you are experiencing infrastructure issues you are supposed to resolve them via configuring/tuning infrastructure.

    The liquibase way of performing such migrations is following:

    I. define SQL migration as:

    do
    $$
        begin
            raise exception 'please perform following update manually';
        end
    $$;
    

    II. operator is supposed to perform following steps:

    • run liquibase update and get error
    • perform manual update
    • run liquibase mark-next-changeset-ran
    • run liquibase update again

    BTW, if we are talking about theoretical possibilities of getting performance benefits I would think about something like:

    do
    $$
        declare
            connections text[];
            tile        record;
            connection  text;
        begin
            begin
                -- split id range into 10 tiles
                for tile in (with tiles as (select ntile(10) over (order by id) as part, id
                                            from tbl)
                             select part, min(id) as rmin, max(id) as rmax
                             from tiles
                             group by part
                             order by part)
                    LOOP
                        connection = 'q' || tile.part;
                        connections = array_append(connections, connection);
                        PERFORM dblink_connect(connection, 'dbname=' || current_database());
                        PERFORM dblink_send_query(connection,
                                                  'update tbl set column1 =''some'' WHERE id >=''' ||
                                                  tile.rmin
                                                      || ''' AND id <= ''' || tile.rmax || '''');
                    END LOOP;
    
                foreach connection in ARRAY connections
                    loop
                        PERFORM * FROM dblink_get_result(connection) as t1(f1 text);
                    end loop;
    
                PERFORM dblink_disconnect(c)
                from (select unnest(dblink_get_connections()) as c) conn
                where c = ANY (connections);
    
            end;
    
        exception
            when others then
                PERFORM dblink_disconnect(c)
                from (select unnest(dblink_get_connections()) as c) conn
                where c = ANY (connections);
                raise;
    
        end;
    $$;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search