Category Archives: postgresql

What’s blocking your way?

If you’ve ever gotten a page about database connections are blocked, or phone call or email from an annoyed user whose transaction just won’t go through, or from a developer who can’t understand why application sessions are blocking each other, you know how useful it can be to identify not just whose lock is doing the blocking, but what object is locked.

Omniti’s labs project Tasty Treats for PostgreSQL provides useful tools for proactive database monitoring . email_locked_queries is the new addition to this toolset . This tool can be handy for proactive lock monitoring in your database to find the blocked and blocking sessions and it sends out email notification if the there is a connection waiting for lock(s) for more than 10 minutes. It’s trivial to change the query if you want to be notified at different threshold.

Let’s test the query tool and see the results by creating locking scenario:

postgres=# create table testlock(foo text,bar text);
postgres=# insert into testlock values (‘lock’,’me’);
INSERT 0 1 postgres=# insert into testlock values (‘find’,’me’);
postgres=# select * from testlock;
foo | bar
find | me
lock | me
(2 rows)

Session 1:

postgres=# begin;
postgres=# update testlock set foo='hang' where foo='lock';

And Session 2, try to update same row:

postgres=# begin;
postgres=# update testlock set foo='escapeme' where foo='lock';

This statement will hang, blocked by the lock that Session 1 is holding on the row.

I ran the query tool and I got this in my email:
-[ RECORD 1 ]
locked_relation |
locked_mode | ShareLock
blocked_pid | 9468
blocked_user | postgres
blocked_statement | update testlock set foo='escapeme' where foo='lock';
blocked_client_addr |
blocked_query_age | 00:10:28.98475
blocking_pid | 8056
blocking_user | postgres
blocking_statement | IDLE in transaction
blocking_client_addr |
blocking_query_age | 00:10:24.403767
blocking_xact_age | 00:10:58.529881

Let me know if you have suggestions to make it more better.

Security & PgWest2010

Open Source databases still needs solid presence in enterprise world. One of the myths about open source databases is that they aren’t secure. To eliminate this myth we have to share our experience related to security aspect of PostgreSQL database. At OmniTi , we help businesses to achieve PCI compliance or provide payment processor:PayPI . PayPI runs on PostgreSQL database.

I feel that Security aspect of the PostgreSQL database isn’t marketed as it should be. Who doesn’t want secured applications and databases ? The answer is everyone cares about security but we can’t able to find many talks about security in the open source conferences. PgWest 2010 has wide variety of talks. Thanks to conference committee to include number of talks about security aspects of application and database. One of them is my session that will talk about achieving PCI compliance with PostgreSQL. If your system transmit, process or store credit card data then it should be PCI compliance system. Please don’t miss this opportunity to share your knowledge & experience during the talk. Even if your system doesn’t need PCI compliance, we can still discuss about security in general and share knowledge about features related to security.
If you are planning to attend PgWest 2010 at San Francisco from Nov 2, 2010 to Nov 4, 2010, looking forward to see you in-person.

Outlining Surge Conference Sessions

OmniTi’s Surge conference is around the corner (Sept 30th) and there are plenty of industry leading speakers and most advance technology sessions to choose from list. Most probably I will be attending second day of two days conference. I spent a lot of time to decide on most interesting sessions to attend from wide variety of sessions. In the past, for other conferences, I found easy to choose from two parallel sessions but it’s not the case at Surge. It’s one of the most difficult task to choose one from two parallel talks.
I hope my list will help you to outline your sessions or at least encourage you to think now!
This is my outline of sessions to attend at Surge Conference on Day -2:
  • Enterprise solutions from commodity components: The Promise and the Peril – Bryan Cantrill
  • Don’t bet the farm on your cache – Brian Akins
  • Top 10 Lessons Learned from Deploying Hadoop in a Private Cloud – Rod Cope
  • Design for Scale – Patterns, Anti-Patterns, Successes and Failures – Christopher Brown
  • From disaster to stability: scaling challenges of -Cosimo Streppone
  • Availability, the Cloud and Everything -Joe Williams

PostgreSQL Recovery Manager – PG_RMAN

During PgEast, I heard a lot about making PostgreSQL acceptable at more enterprise shops. I firmly believe that PostgreSQL has potential but lack of marketing perspective during the promotion of product and tools. Anyways, let’s start creating/testing/using and telling new tools those can be attracted by C-Level people!!
If you have Oracle background , you must know Oracle Recovery Manager – RMAN. Oracle has RMAN since Oracle7i (may be longer ). It has been enhanced during each release. It is widely used program at most of the Oracle shops. Obviously, Oracle puts lots of marketing efforts around RMAN features per release.
I came across a Google Code project called pg_rman, which is developed by NTT Open Source Software Center. It looks promising by looking at features list. If i compare these features with Oracle, it includes features comparable to Oracle9i RMAN feature list.

pg_rman features:
  • Ease of use. Backup and restore can be done with just one command.
  • Online full backup, incremental backup, and archive backup.
  • Backup compression. Unused area in pages are removed and only actual data are compressed with gzip.
  • Automatic backup maintenance. Backup and archive WAL files older than specified days are deleted automatically.
  • Backup validation. Backup can be validated with CRC checks
  • No transaction lost in restore. Configuration file generator for point-in-time recovery is supported.
Let's put hands together to test a new tool and help making PostgreSQL more advanced with adding these rich tools!!

I am going to test the pg_rman pretty soon!!

Secure your web from SQL Injections

Last couple of months, I came across two incidents of SQL injections due to most common errors by IT professionals.

  1. How To Hack A Brazilian Power Company
  2. SQL Injection discovered on Wall Street Journal
Both incidents are stunning and they were exploited by hackers with minimal effort. Little care during web application development could avoid both incidents.
If you are IT professional , you must know about SQL injections’ cause and remedy.
wikipedia provides detailed definition :

“SQL injection is a code injection technique that exploits a security vulnerability occurring in the database layer of an application. The vulnerability is present when user input is either incorrectly filtered for string literal escape characters embedded in SQL statements or user input is not strongly typed and thereby unexpectedly executed. It is an instance of a more general class of vulnerabilities that can occur whenever one programming or scripting language is embedded inside another. SQL injection attacks are also known as SQL insertion attacks.”

Most common SQL injections are:

1. Incorrectly filtered escape characters
  • Most common is Quote or Special character handling

2. Incorrect type handling

  • Brazil incident fall into this category

3. Vulnerabilities inside the database server

  • WSJ incident fall into this category : BAD password!

4. Blind SQL injection

  • Result of this attack is not visible to hacker but it can help to exploit more SQL injections.
  • conditional responses
  • conditional errors
  • execute long running queries

You can avoid these catastrophic SQL injections by little care during the web application coding such as using bind variables, verify input values and take care of escaping characters. Database level security such as password and access control also help to avoid SQL injections.

I hope this will save you from future SQL injection 🙂

Postgres 8.4 – Replace regular index with Text Based Index

I came across a nice post by Peter Eisentraut that shows how text based index can be used to do direct lookup on the text fields.

Quick test proves that..

postgres=# d employees
Table "public.employees"
Column | Type | Modifiers
emp_id | integer | not null
name | text |
"employees_pkey" PRIMARY KEY, btree (emp_id)

postgres=# create index idx_employees_name on employees(name);
postgres=# explain select * from employees where name like 'Denish%';
Seq Scan on employees (cost=0.00..25.38 rows=6 width=36)
Filter: (name ~~ 'Denish%'::text)
(2 rows)

postgres=# explain select * from employees where name='Denish';
Bitmap Heap Scan on employees (cost=4.30..13.76 rows=6 width=36)
Recheck Cond: (name = 'Denish'::text)
-> Bitmap Index Scan on idx_employees_name (cost=0.00..4.30 rows=6 width=0)
Index Cond: (name = 'Denish'::text)
(4 rows)

postgres=# create index idx_employees_name_like on employees (name text_pattern_ops);
postgres=# explain select * from employees where name like 'Denish%';
Bitmap Heap Scan on employees (cost=4.31..13.78 rows=6 width=36)
Filter: (name ~~ 'Denish%'::text)
-> Bitmap Index Scan on idx_employees_name_like (cost=0.00..4.31 rows=6 width=0)
Index Cond: ((name ~>=~ 'Denish'::text) AND (name ~ Bitmap Index Scan on idx_employees_name_like (cost=0.00..4.30 rows=6 width=0)
Index Cond: (name = 'Denish'::text)
(4 rows)

Above example shows that we don’t need regular index on name field as text based index (idx_employees_name_like) fulfill the gap. However, if we do range query i.e select * from employees where name > ‘denish’ and name<'patel' then text based index will not be used. I don't think we ever run a range query on text based fields!.

Yet Another PostgreSQL Replication Tool – RubyRep

One of the key features any enterprise considers when choosing a database technology for their architecture solution stack is that of replication. Oracle and MySQL both have built in replication solutions, but as of yet PostgreSQL doesn’t support a built in replication solution. There are many replication solutions available however, and different companies are using different solutions customized for their needs.

Among all of the solutions, Slony is probably the most widely tested and deployed within organizations, although it does have the following limitations:

  • Replicated tables must have a unique or primary key
  • It does not support replication of large objects
  • Schema changes are not propagated (though they can be coordinated)
  • It does not support synchronizing databases outside of replication
  • There are limitations on version compatability; you can not replicate from PostgreSQL 8.2 to PostgreSQL 8.4 for example
  • It is more difficult to set up than many other replication solutions

One new alternative to Slony is a project known as RubyRep, which is designed to avoid some of the limitations of Slony. RubyRep provides both master-slave and master-master replication, and it works for PostgreSQL as well as MySQL. It is currently developed by Arndt Lehmann, a German who has been living since 2001 in Tokyo, Japan. He also provides great support to the RubyRep mailing list, especially for adding new features or fixing bugs.

RubyRep always operates on two databases. To make it simple to understand, the databases are referred to as “left” and “right” database respectively.

RubyRep’s key features includes:

  • Simple configuration, complete setup can be done via single configuration file.
  • Simple Installation, if you have a JVM installed, then you just have to download and extract the files.
  • Platform Independent, it runs on Unix and Windows platform.
  • Table Design Independent, meaning that all commands work on tables no matter if they have a simple primary key (all data types acceptable), a combined primary key, or no primary key at all. It successfully processes multi-byte texts and “big” data types
  • It replicates tsvector datatype

In addition to the above, RubyRep actually provides three tools in one; a Compare, Sync, and Replication tools.


This tool scans corresponding tables of left and right database, looking for diverging data. Key features of the comparison tool are:

  • Different output modes, from a count of differences to full row dumps.
  • Low bandwidth mode available, reducing the number of round-trips so only actual differences go through the network.
  • A progress bar with estimated remaining amount of work.
  • Server load is targeted toward only the “right” database server.

In one test we ran, we compared two 50 million row tables in around 3 hours, without affecting production server load. This is accomplished by comparing rows in batches, and you can adjust the batch size in the configuration file.


The sync tool is used to synchronize data in corresponding tables of a left and right pair of databases. Key features of the sync tool are:

  • All features of the Compare tool also apply to syncs
  • Automatically orders table syncs to avoid foreign key conflicts.
  • You can configure the Sync policy to ignore deletes in left database, or to ignore creating records in right database, and other such combinations
  • Provides two prebuilt conflict resolution methods, either left db wins or right db wins
  • Custom conflict resolution methods specifiable via ruby code snippets
  • Merge decisions can optionally be logged in the rubyrep event log table.


Of course RubyRep also provides a replication tool. Some of the key features of the replication tool include:

  • Automatically sets up all necessary triggers, log tables, etc.
  • Automatically discovers newly added tables and synchronizes the table content
  • Automatically reconfigures sequences to avoid duplicate key conflicts
  • Tracks changes to primary key columns
  • Can implement either master-slave or master-master replication
  • Prebuilt conflict resolution methods available include left or right wins, or earlier, later change wins
  • Custom conflict resolution specifiable via ruby code snippets
  • Replication decisions can optionally be logged in the rubyrep event log table

One of the problems common to replication solutions is that of setting up new nodes. With Slony, there are always some headaches caused by high load on master database server, as a result of the TRUNCATE/COPY cycle Slony goes through. In the case of RubyRep, most of the CPU load is on the slave server, and you can use the Sync command in advance before you start replicating database. RubyRep also provides some flexibility to ignore the Sync commands if you don’t want to sync the database again.

RubyRep in action…

  • Help
    ; ./rubyrep --help
    Usage: ./bin/rubyrep [general options] command [parameters, ...]
    Asynchronous master-master replication of relational databases.
    Available options:
    --verbose Show errors with full stack trace
    -v, --version Show version information.
    --help Show this message
    Available commands:
    generate Generates a configuration file template
    help Shows detailed help for the specified command
    proxy Proxies connections from rubyrep commands to the database
    replicate Starts a replication process
    scan Scans for differing records between databases
    sync Syncs records between databases
    uninstall Removes all rubyrep tables, triggers, etc. from "left" and "right" database

  • Generate configuration file
    ; ./rubyrep generate pagila.conf

  • Compare/Sync Example:
    ; cat pagila.conf
    RR::Initializer::run do |config|
    config.left = {
    :adapter => 'postgresql', # or 'mysql'
    :database => 'pagila,
    :username => 'rubyrep',
    :password => 'rubyrep',
    :host => '',
    :port =>'5432'

    config.right = {
    :adapter => 'postgresql',
    :database => 'pagila',
    :username => 'rubyrep',
    :password => 'rubyrep',
    :host => '127,0.0.1',
    :port => '5483'

    config.include_tables 'users'
    # config.include_tables /^e/ # regexp matching all tables starting with e
    # config.include_tables /./ # regexp matching all tables in the database

    ; ./rubyrep scan -d=keys -b -c pagila.conf > users_diff.log
    ; cat users_diff.log
    users users ......................... 1
    - lastname: patel
    zipcode: "2096"
    userid: 48212620
    address: columbia

    ; ./rubyrep sync -c pagila.conf

  • Replication example:

By default, RubyRep runs in master-master replication mode, but you can adjust the following configuration setting to make it master-slave replication:

; cat pagila_replicate.conf
RR::Initializer::run do |config|
config.left = {
:adapter => 'postgresql',
:database => 'pagila,
:username => 'rubyrep',
:password => 'rubyrep',
:host => '',
:port =>'5432',
:schema_search_path => 'public,pagila'
config.right = {
:adapter => 'postgresql',
:database => 'pagila',
:username => 'rubyrep',
:password => 'rubyrep',
:host => '127,0.0.1',
:port => '5483',
:schema_search_path => 'public,pagila'
config.include_tables /./ # regexp matching all tables in the database
config.options[:auto_key_limit] = 60
config.options[:adjust_sequences] = false
config.options[:sequence_increment] = 1
#Sync Policy: Changes in the right database will not be applied to the left database.
config.options[:right_record_handling] = :ignore
config.options[:sync_conflict_handling] = :left_wins
# Additional logging
config.options[:logged_replication_events] = [
#ignore history tables
config.exclude_tables /_history/
config.exclude_tables 'test1'
config.exclude_tables 'pagila'

Detailed information for the each configuration setting can be found in the RubyRep documentation. There are also sample configuration files and a tutorial available too for getting familiar with each of the features RubyRep offers. Based on our initial testing, it should also be possible to upgrade some older PostgreSQL databases from 8.2 to 8.4.