The following procedure is run on a table with 20M records. It stopped at ~560000 and I don’t know why. I’m still learning DB monitoring/troubleshooting. This is on Aurora Postgres Serverless v2 with 6 ACUs (12 GB mem).
The content is html in a text
column.
create or replace procedure schema_v2_update_href_again()
language plpgsql
as $$
declare
max_id bigint;
begin
select max(version_id)
from versions
into max_id;
for id in 0..max_id loop
update versions
set content = REGEXP_REPLACE(content, 'href="./US/eng/1/article/1/', 'href="./US/eng/1/article/', 'g')
where version_id = id;
if id % 10000 = 0 then
raise notice 'id: %', id;
commit;
end if;
end loop;
end; $$;
Should I be looking at the DB monitoring tools or could this be something that I’ve missed in the plpgsql?
The goal of the query is to go through each record and replace a url pattern in the html. There is no performance requirement. I’m not trying to be fast or efficient, I just wanted a simple procedure. Is this method naive?
Update #1:
schema for versions
table
create table versions(
version_id bigserial primary key,
group_address entity_address not null default (0, 0),
part_id bigint not null references parts(part_id),
created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
content_type smallint not null,
content text not null,
root_entity_id bigint,
root_entity_table table_name
);
create index on versions using hash (content);
create index on versions (part_id);
3
Answers
Not sure yet if this is the fix, but I noticed that a triggered proc that ran after the one I posted would
raise notice
hundreds of times per insert. I then remembered that in the "Data Output" section of pgAdmin there was anEOF
line that didn't make sense to me (didn't think to mention that, woops). I'm thinking now the driver got too many messages back from running thecall
that it quit. Or maybe it happened on the server. I'm too new to Postgres to reason through this part so maybe someone knows as to whether there is a limit to print statements when running a procedure.If the table is large, this will be extremely slow issuing one update per row.
If the intent is to update the whole table without locking it, you can do this by updating in batches. One update per range of 10,000 IDs.
Something like that, the syntax might not be totally correct.
Do you really need to raise notices ?
If not, you could do the job with a basic sql statement with better performance while avoiding errors due to the loop, couldn’t you ?
or even embedded this statement in a sql procedure :