skip to Main Content

Our logical replication has started to fail with a lot of these errors on the subscriber:

ERROR:  could not create replication slot 
"pg_148211637_sync_148178452_7161337762731575223": ERROR:  all replication slots are in use
May  2 07:43:01 psql-06 postgres[1195903]: [2]  HINT:  Free one or increase max_replication_slots.
May  2 07:43:01 psql-06 postgres[1181479]: [1] LOG:  background worker "logical replication worker" (PID 1195903) exited with exit code 1
May  2 07:43:01 psql-06 postgres[1195905]: [1] LOG:  logical replication table synchronization worker for subscription "sub_production", table "products" has started
May  2 07:43:01 psql-06 postgres[1195901]: [1] LOG:  could not drop replication slot "pg_148211637_sync_148183118_7161337762731575223" on publisher: ERROR:  replication slot "pg_148211637_sync_148183118_7161337762731575223" does not exist

It seems like it’s constantly trying drop the replication slot pg_148211637_sync_148183118_7161337762731575223 on the publisher, but somehow it does not exists?

How can such a situation occur, and how do I resolve it/resume the replication?

2

Answers


  1. Connect to the primary and look for abandoned replication slots there:

    SELECT slot_name, slot_type, restart_lsn
    FROM pg_replication_slots
    WHERE NOT active;
    

    Any of these that don’t belong to a standby server that’s still needed should go:

    SELECT pg_drop_replication_slot('name of the slot');
    

    If you need more than 10 replication slots, increase max_replication_slots on the primary and restart it.

    It is hard to say how you got into trouble, but probably like this: whenever you say CREATE SUBSCRIPTION, the standby connects to the primary and creates a logical replication slot (unless you explicitly tell it not to). If then something goes wrong with creating the subscription, of if replication breaks, the replication slot is still there on the primary. So perhaps it took many failed attempts until you got logical replication running, and you neglecting cleaning up after failed attempts.

    Login or Signup to reply.
  2. You are misreading the error. The primary error is that it cannot create the slot, because all allowed slots are already in use.

    But the replica then tries to clean up after itself, which it does by attempting the drop the slot which it failed to create in the first place, which of course doesn’t work, because there is nothing to clean up. So this part fails, but it is not the interesting part of the overall failure.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search