Postgres Replication Slot & pg_basebackup

On #postgresql IRC channel, someone was having problem bringing up standby database server from backups taken using pg_basebackup and recovery.conf was configured to use replication slot.

The reporter was following  below steps:

  1. Set max_replication_slots=5, wal_keep_segments=0 and restart primary database
  2. create replication slot :
  3. SELECT * FROM pg_create_physical_replication_slot('standby')
  4. Call pg_basebackup -D – -Ft | bzip2 > backup.tar.bz2
  5. copy backup.tar.bz2 to standby server
  6. start postgresql on standby

The standby server was throwing following error during startup ..

 requested WAL segment 000000010000000C000000AE has already been removed.

Question from Reporter:

Should I set wal_keep_segments to some non-zero value if I want to use replication slots because while reading docs I understand that I shouldn’t set wal_keep_segments if I use max_replication_slots?

Short answer :

As of now, pg_basebackup doesn’t have support for replication slot so it doesn’t have details on status of WAL files being created or removed.

While the feature is being worked on, you can use one of the following solutions to get around the problem for now.

Solution 1:

On primay database server, set wal_keep_segments to some sane value relative to database traffic volume and time to make & transfer backup to bring up standby server. This setting will keep # of WAL files around on the primary server and will not delete them so when you bring up standby server using replication slot, it will start applying WAL files and eventually connect to physical slot created.

Solution 2:

Create slot before taking backup and use the slot at the least once so primary db server will keep the WAL files around.

  1. On primary db server, create slot
  2. connect to slot using pg_receivexlog
  3. /usr/pgsql-9.4/bin/pg_receivexlog -h primary-db-server -p 5432 -U replication –S 'standby1' -n -v -D /var/lib/pgsql/9.4/archive
  4. You can kill the pg_recievexlog process
  5. Take backup using pg_basebackup
  6. copy the backup and startup  standby database
  7. The standby database should be able to connect to slot now!
  8. Don’t forget to drop the slot otherwise Primary DB server will keep the WALs indefinitely

I would prefer Solution #2 over Solution #1 because you don’t have to figure out optimal value for wal_keep_segments. Plus, you can use pg_receivexlog method  for permanent archive solution. If you want to get more details, I will be presenting tutorial on replication topic “Out of the Box replication in Postgres 9.4” at Postgres Conference in San Francisco on Nov 17-18th. I will be provide hands on tutorial to setup robust replication for production db servers without using any third party tools in Postgres 9.4

If you haven’t registered, this is the time to register because the conference schedule looks amazing!  You could still register with the discount code SeeMeSpeak which provides a 20% discount.

Let me know, if you have better solutions than I discussed in the post. I’m looking forward to see you at the conference.

 

4 thoughts on “Postgres Replication Slot & pg_basebackup”

  1. It’s important to *drop the slot when you’re done*, otherwise your server will continue to accumulate WAL indefinitely.

    This is one of the reasons why it’d be good to teach pg_basebackup to handle slots its self.

    Ideally we’d also have the idea of a transient slot that’s dropped when the connection drops, so that users whose pg_basebackup failed or was terminated prematurely didn’t have dangling slots lying around. We don’t need to repeat the user confusion around pg_multixact.

Comments are closed.