Postgres – ERROR: Unsupported startup parameter: replication

Postgres provides much clear ERROR reporting messages compare to other databases/data-stores I have worked or been working at my $DAYJOB. However, someone reported following error on their Secondary Postgres replicated database server.

2015-01-22 15:20:49.247 GMT 56914 LOG: unexpected pageaddr 142D/73000000 in log file 5166, segment 75, offset 0
2015-01-22 15:20:49.413 GMT 42933 FATAL: could not connect to the primary server: ERROR: Unsupported startup parameter: replication

On the first site, it looks scary but it’s actually not !

I googled a bit but I couldn’t able to find quick answer. So, I investigated further to understand what does reported error mean?  Fortunately, the replication was up-to-date because most of our clients have been recommended or setup to use hybrid replication (WAL shipping using OmniPITR + Streaming Replication).

While discussing further with the team, it turned out they introduced pgbouncer, a connection pooling solution, in to their architecture to better manage connections on the db server. The pgbouncer is set up to run on port 5432 (default Postgres port) on the master database server and make Postgres to listen on 5433. This change wasn’t reflected on secondary replicated database server recovery.conf file and it was still pointing to port 5432. So, we found the cause of the error & the mystery is solved !!

recovery.conf file has been modified to use port 5433 and restarted secondary database to load the config changes. Yay !! Secondary database is connected to primary db server using streaming replication.

2015-01-22 15:41:09.357 GMT 49294 LOG: database system is ready to accept read only connections
2015-01-22 15:41:09.512 GMT 49296 LOG: restored log file "000000030000142E0000006A" from archive
2015-01-22 15:41:10.016 GMT 49296 LOG: unexpected pageaddr 142D/8E000000 in log file 5166, segment 107, offset 0
2015-01-22 15:41:10.182 GMT 49408 LOG: streaming replication successfully connected to primary

I wanted to share my experience that can be useful if you see this ERROR  in your Postgres installation ! Hope this will help :)

Postgres work_mem setting is a powerful performance tuning knob!

Postgres provides various parameters in postgresql.conf  for performance tuning related to help better use database server resources. At OmniTI, One of our client’s application was reported slowness during high intensive but very critical daily operations. The Postgres DB server is servicing mail generation application with 3-4K TPS and containing more than 400K partitioned tables.  The systems, including DB servers, are getting monitored through Circonus, so it was easier to review graph patterns to find out anything changed recently.

While investigating the graph patterns, we came across significantly high IO increase  (~10-11K per Sec) on Circonus IOSTAT graph around end of Nov. This pattern change matches with the Software upgrade. The software was upgraded by end of November !! As you can see below in the graph, the IO increased up to 10-11K per seconds after upgrading software since last week of Nov,2014.

IOSTAT_graph_0

It was easier to dig further because daily pgbadger log analysis reports were ready for review! Since Software upgrade, the most accessible stored function was generating ~8TB of of temp files per day!! Digging further into report, the average size of generated temp files were around ~100MB size.  As you might be aware, hash tables and sort operations should happen in memory , which depends on work_mem setting,  if the size of the operation is larger than work_mem , then it will end up happening on disks. The work_mem was set to 50MB. It’s clear that the last software upgrade introduced a change in function and underlying query. This change was the root cause of spike in disk IO activities observed in the IOSTAT graph.

The first attempt to tune the query so it avoids disk sorting but there are two challenges to tune the query:

(1) It wasn’t easy to change the application code because it has to go through application change process, which might take 1-2 months.

(2) the query is accessing pg catalog tables to gather details. The pg catalog tables are pretty large because of large number of partitioned tables. For example, pg_class table size is 2597 MB.

However, the server specs are beefy !  It has 252 GBs of total available RAM and have enough memory to allocate for memory sorting. We decided to increase work_mem from 50MB to 128MB.  Postgres allows to change work_mem without database restart, it was pretty easy to convince client to  buy in the proposed change.  After the change,  you can see in the graph below, it helped to reduce IO to 1/2 !!  That’s amazing improvement with a single online change :)

IOSTAT-blog

We are still working on the problematic query and some other temp table generating queries to reduce IO on the system.  Clearly, work_mem knob helped a lot until the queries are fixed permanently.

Wrapping up 2014 & Welcoming 2015!

It’s about a time to wrap up 2014 :)

This year, 2014, has been very successful and productive year for my career and professional development. I have got chance to speak at 5 conferences and a Postgres users group in Philly.  Yet again chance to attend Surge 2014 conference in Washington DC, hosted by my employer, OmniTI. Thanks to all conference committee members for inviting me for speaking and networking opportunities.

Further more, my team (database team) was nominated and won Silver medal in the Best Technology Team of the year category for the Stevie Awards.  Thanks a lot to everyone in the team for providing support to achieve this recognition.

Last but not the least, Postgres 9.4 was released last week. It is an awesome gift for us from Postgres community!  I’m looking forward to upgrade couple of large production systems  to Postgres 9.4 early next year.

Additionally, I’d like to push myself to involve more into supporting Postgres community over next  year(s). As of now, I will be speaking and/or attending Postgres Conf Russia & Postgres NYC conferences in early 2015. I’m hoping to write more blog posts on technologies in 2015.

I’m ready to Welcome 2015 :)

Thanks again for reading and your continuous support.