skip to Main Content

We have an application going on for more than four years and a lot of data was being accumulated during this period. With the push for new features all this time, implementing archiving and clean up procedures where pushed back again and again.

Deleting records with DELETE statements was no longer an option as it would have taken forever. So we created a script that was acting in a more aggressive way by creating a temporary table with the records we wanted to keep (last month of data) then dropping the original table then renaming the temporary table to the right name and recreating the constrains and indexes, etc

CREATE TABLE fred_to_keep AS SELECT * FROM fred WHILE last_update < current_date - 31 * INTERVAL '1 day';
DROP TABLE fred;
ALTER TABLE fred_to_keep RENAME TO fred;
-- create indexes and constraints here, change the owner etc

Given the aggressive nature of our approach this was tested lots of times on databases with a similar data size and repartition with our production database and the overall time taken was about 2 and 1/2 hours. From the script logs one of the tables was taken one hour to clean up.

Then we decided to run it in production. Everything was going fine until it hit the table fred and after three hours the script was still not finished and we had to take the decision to kill the script and restore the database from the snapshot taken just before started. After waiting another half an hour we had no choice but to do the rollback.

Now coming from an Oracle background I know it is possible by interrogating dba_segments to see the table size increasing while it is being created. Trying to google for a similar thing in PostreSQL did not take me anywhere and I am wondering is there any way to actually see the progress of CREATE TABLE AS SELECT * FROM fred WHERE ... while this is being executed? Being able to see the space taken by the table being created and comparing it against expected size would have helped us to take a much more informed decision about killing the script or leaving it run. It is a bad feeling to think we killed it maybe just one minute before finishing.

2

Answers


  1. Example of using pg_relation_size() to monitor table size.

    create table dog(id integer);
    
    select pg_relation_size('dog');
     pg_relation_size 
    ------------------
                    0
    
    insert into dog values (1);
    select pg_relation_size('dog');
    pg_relation_size 
    ------------------
                 8192
    
    insert into dog select * from generate_series(1, 100000);
    select pg_relation_size('dog');
    pg_relation_size 
    ------------------
              3629056
    
    

    If you do this in a explicit transaction e.g. Begin; <DDL, DML statements> Commit; then other sessions will not be able to see/monitor until the Commit is done.

    UPDATE

    Potential work around for monitoring from other session.

    --Session 1
    BEGIN;
    create table dog(id integer);
    COMMIT;
    --Session 1
    BEGIN;
    insert into dog values (1);
    select pg_relation_size('dog');
     pg_relation_size 
    ------------------
                 8192
    ...
    --Session 2
    select pg_relation_size('dog');
     pg_relation_size 
    ------------------
                 8192
    ...
    --Session 1
    insert into dog select * from generate_series(1, 100000);
    select pg_relation_size('dog');
     pg_relation_size 
    ------------------
              3629056
    ...
    --Session 2
    select pg_relation_size('dog');
     pg_relation_size 
    ------------------
              3629056
    ...
    --Session 1
    COMMIT;
    

    Using on disk file to monitor.

    BEGIN;
    create table dog(id integer);
    SELECT pg_relation_filepath(oid) FROM pg_class WHERE relname = 'dog';
     pg_relation_filepath 
    ----------------------
     base/59704/241012
    
    -- ls -al base/59704/241012
    -- -rw------- 1 postgres users 0 Jan 21 08:33 base/59704/241012
    
    insert into dog values (1);
    
    -- ls -al base/59704/241012
    -- -rw------- 1 postgres users 8192 Jan 21 08:38 base/59704/241012
    
    insert into dog select * from generate_series(1, 100000);
    
    -- ls -al base/59704/241012
    -- -rw------- 1 postgres users 3629056 Jan 21 08:39 base/59704/241012
    
    Login or Signup to reply.
  2. If it is all done in one transaction, there is no good way to do this in postgresql from SQL, because until the creation of the table is committed, no one else can get there eyes on it. You can use file system tools to watch the file grow, but you first need to infer which file is the right one. If you don’t have access to the file system, I’m afraid you are out of luck.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search