We a have PostgreSQL 12.7 database with multiple schemas, each schema being assigned to a specific test environment.
Part of our data model evolution we had an ALTER TABLE my_table DROP COLUMN my_column
SQL that needs to be executed via Flyway BD evolution tool. Our spring boot application failed to start because Flyway failed with a timeout.
Connecting to the database as a DBA I tried to manually run that DROP COLUMN
statement with no result. It was taken forever. The table is a very small table with just 112 rows, you would expect the DROP statement will execute instantly. We had a very similar problem before with a DROP TABLE
statement on a different PostgreSQL database (same version) and the only work around was to destroy the database and recreate it in AWS. We cannot do the same with this database instance because of the multiple test environments connected to it. It will impact everyone. I tried dropping the schema for that particular environment using DROP SCHEMA my_schema CASCADE
, again I had to cancel after waiting for more than 10 minutes.
On the other hand my feeling is that is just a matter of time of when this is going to happen in PROD so we need to find a way to resolve this type of issues.
Any help would be greatly appreciated.
** UPDATE **
I ran suggested query:
SELECT pid,
usename,
pg_blocking_pids(pid) as blocked_by
FROM pg_stat_activity
WHERE cardinality(pg_blocking_pids(pid)) > 0
AND query ='ALTER TABLE my_table DROP COLUMN my_column;';
and it came up with this result:
pid | usename | blocked_by
-------+----------+-----------------------------------------
29688 | dba_root | {0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}
(1 row)
If cancelling the ALTER TABLE ...
the above query brings nothing.
2
Answers
You can try to run this query:
Since the blocking process IDs show up as 0, they may be prepared transactions. Look into
pg_prepared_xacts
to see if there are any prepared transactions that a buggy application failed to clean up.