Possibility of Graceful Switchover?

I have been using PITR master-slave solution for long time for “failover” to slave server when there is a problem with the master server. It has proven a valuable solution for most of clients needs. The problem with “failover” solution is that we have to rebuild old-master as slave from completely new base backup from newly promoted master server. Oracle provides nice feature called “Graceful Switchover“. In this method, you can switch duty between master and slave servers and don’t have to rebuild old-master as slave from newly promoted master server. It could be very useful during server or data center move as we can shutdown old database and can have total control over xlogs to be processed before we start traffic on newly promoted master database.

I couldn’t find any graceful switchover document or blog for the PostgreSQL . Today, I spent some time on testing same theory as Oracle uses for the switchover. Oracle controls switchover using controlfile, Why can’t I use pg_control to do the same in totally controlled environment?
For PostgreSQL graceful switchover, I tested following scenario on very small test database server with pre configured setup of master and PITR slave database:
  • Step-1 : Shutdown current master database
  • Step-2 : Apply all wal logs including the one created during shutdown process to slave database server
  • Step-3: Promote slave as master in read/write mode using triggered failover file
  • Step-4: Before pointing production traffic to newly promoted master server, take backup of $PGDATA/global/pg_control and $PGDATA/global/pgstat.stat files. These two files differ between master and slave after failing over slave database into production mode.
  • Step-5: Setup old – master server for slave for newly promoted master sever and replace $PGDATA/global/pg_control and $PGDATA/global/pgstat.stat with the files backed up in Step-4
  • Step-6: Check log files on old-master to make sure that it’s recovering copied wal logs from newly promoted master server(old-slave server)
Yay! I could successfully switched over slave to master and vice versa !
At this stage, old-master(new slave) started playing wal logs coming from new-master(old-slave) server. After that, I failed over new slave server to make sure we can actually bring the database in production mode. It worked like a charm!! I ran database wide vacuum after failover and it ran fine without any error.
I have done limited testing as I have tested this with very tiny setup of “pagila” database on test server and I don’t recommend you to use on the “production database” servers as of now.
Please test this in your test or dev environment and let me know your results and concerns!!

PostgreSQL : Track functions to tune

Starting from PostgreSQL 8.4, there is a valuable addition in PostgreSQL features list: Per function statistics. Prior to this feature , there wasn’t an easy way to find out most time consuming functions in the database.

For time spent on single call, it’s easy to find out by executing function manually but it was too cumbersome and in some cases impossible to find out total time spent by function for the given time period. Enabling full logging and aggregating duration for each call by reading large log files was the only way to get that statistic. Now, you can find this details by querying a single view pg_stat_user_functions ! Awesome!
To enable function tracking you need to enable “track_functions” postgresql.conf parameter. The default is none, which disables function statistics tracking. Specify pl to track only procedural-language functions, all to also track SQL and C language functions.

#track_functions = none # none, pl, all

Once you have track_functions enabled in your database, you can get details by querying pg_stat_user_functions view:

select * from pg_stat_user_functions;


It provides following details:

funcid
schemaname
funcname
calls
total_time
self_time

Self time is the amount of time spent in the function itself, total time includes the time spent in functions it called. Time values are in milliseconds.

Here is the easy solution to track and find candidate that requires tuning!!

What’s blocking your way?

If you’ve ever gotten a page about database connections are blocked, or phone call or email from an annoyed user whose transaction just won’t go through, or from a developer who can’t understand why application sessions are blocking each other, you know how useful it can be to identify not just whose lock is doing the blocking, but what object is locked.

Omniti’s labs project Tasty Treats for PostgreSQL provides useful tools for proactive database monitoring . email_locked_queries is the new addition to this toolset . This tool can be handy for proactive lock monitoring in your database to find the blocked and blocking sessions and it sends out email notification if the there is a connection waiting for lock(s) for more than 10 minutes. It’s trivial to change the query if you want to be notified at different threshold.

Let’s test the query tool and see the results by creating locking scenario:

postgres=# create table testlock(foo text,bar text);
CREATE TABLE
postgres=# insert into testlock values (‘lock’,’me’);
INSERT 0 1 postgres=# insert into testlock values (‘find’,’me’);
INSERT 0 1
postgres=# select * from testlock;
foo | bar
——+—–
find | me
lock | me
(2 rows)

Session 1:

postgres=# begin;
BEGIN
postgres=# update testlock set foo='hang' where foo='lock';
UPDATE 1

And Session 2, try to update same row:

postgres=# begin;
BEGIN
postgres=# update testlock set foo='escapeme' where foo='lock';

This statement will hang, blocked by the lock that Session 1 is holding on the row.

I ran the query tool and I got this in my email:
-[ RECORD 1 ]
--------+-----------------------------------------------------
locked_relation |
locked_mode | ShareLock
blocked_pid | 9468
blocked_user | postgres
blocked_statement | update testlock set foo='escapeme' where foo='lock';
blocked_client_addr |
blocked_query_age | 00:10:28.98475
blocking_pid | 8056
blocking_user | postgres
blocking_statement | IDLE in transaction
blocking_client_addr |
blocking_query_age | 00:10:24.403767
blocking_xact_age | 00:10:58.529881

Let me know if you have suggestions to make it more better.