skip to Main Content

The following procedure is run on a table with 20M records. It stopped at ~560000 and I don’t know why. I’m still learning DB monitoring/troubleshooting. This is on Aurora Postgres Serverless v2 with 6 ACUs (12 GB mem).

The content is html in a text column.

create or replace procedure schema_v2_update_href_again()
language plpgsql
as $$
declare
  max_id bigint;
begin

  select max(version_id) 
  from versions
  into max_id;
  
  for id in 0..max_id loop
    update versions
    set content = REGEXP_REPLACE(content, 'href="./US/eng/1/article/1/', 'href="./US/eng/1/article/', 'g')
    where version_id = id;
    
    if id % 10000 = 0 then
      raise notice 'id: %', id;
      commit;
    end if;

  end loop;

end; $$;

Should I be looking at the DB monitoring tools or could this be something that I’ve missed in the plpgsql?

The goal of the query is to go through each record and replace a url pattern in the html. There is no performance requirement. I’m not trying to be fast or efficient, I just wanted a simple procedure. Is this method naive?

Update #1:
schema for versions table

create table versions(
  version_id bigserial primary key,
  group_address entity_address not null default (0, 0),
  part_id bigint not null references parts(part_id),
  created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  content_type smallint not null,
  content text not null,
  root_entity_id bigint,
  root_entity_table table_name
);

create index on versions using hash (content);
create index on versions (part_id);

3

Answers


  1. Chosen as BEST ANSWER

    Not sure yet if this is the fix, but I noticed that a triggered proc that ran after the one I posted would raise notice hundreds of times per insert. I then remembered that in the "Data Output" section of pgAdmin there was an EOF line that didn't make sense to me (didn't think to mention that, woops). I'm thinking now the driver got too many messages back from running the call that it quit. Or maybe it happened on the server. I'm too new to Postgres to reason through this part so maybe someone knows as to whether there is a limit to print statements when running a procedure.


  2. If the table is large, this will be extremely slow issuing one update per row.

    If the intent is to update the whole table without locking it, you can do this by updating in batches. One update per range of 10,000 IDs.

      id := 1;
      batch_size := 10000;
      while id <= max_id loop
        update versions
        set content = REGEXP_REPLACE(content, 'href="./US/eng/1/article/1/', 'href="./US/eng/1/article/', 'g')
        where version_id between next_id and id + batch_size;
        
        raise notice 'id: %', id;
        commit;
        id = id + batch_size;
      end loop;
    

    Something like that, the syntax might not be totally correct.

    Login or Signup to reply.
  3. Do you really need to raise notices ?

    If not, you could do the job with a basic sql statement with better performance while avoiding errors due to the loop, couldn’t you ?

    update versions
    set content = REGEXP_REPLACE(content, 'href="./US/eng/1/article/1/', 'href="./US/eng/1/article/', 'g')
    where version_id >= 0
    

    or even embedded this statement in a sql procedure :

    create or replace procedure schema_v2_update_href_again()
    language sql
    as $$
        update versions
        set content = REGEXP_REPLACE(content, 'href="./US/eng/1/article/1/', 'href="./US/eng/1/article/', 'g')
        where version_id >= 0 ;
    $$;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search