Upcoming Conferences in September

I thought to share conference details that I’m going to attend next month. If you aren’t aware but interested attending any of them.

  1. PostgresOpen at Chicago
    • Postgres conference (Sept 17th – Sept 19th)
  2. Surge at National Harbor,MD
    • Scalability and Performance Conference  (Sept 24th – Sept  26th)

The speakers and talks lineup is looking interesting for both conferences. If you haven’t registered for them, it’s not too late yet!!  If you are planning to attend anyone of them, looking forward to meet you in-person.

See you soon!

10 commandments of Database Management

After having decade of experience managing small to large scale and/or varieties of database  systems, here is my first try to come up with the top ten commandments of database management !

  1. Thou shalt always learn database systems strengths and weaknesses
  2. Thou shalt choose appropriate database to store and process data to empower business
  3. Thou shalt always build systems to be resilient/cope with failures
  4. Thou shalt implement automated restore procedures to test backups
  5. Thou shalt always trend and monitor database performance and maintenance metrics
  6. Thou shalt  document and follow database change management procedure
  7. Thou shalt plan to upgrade database systems in timely manner
  8. Thou shalt always build tools to automate processes
  9. Thou shalt implement security policy and processes to secure data
  10. Thou shalt educate developers to write efficient code against databases

If your commandments list differs from mine, I’m interested to know your list. Please don’t hesitate to post in comment.  Thanks!

 

Preventing Human Errors in Postgres

Whenever discussion on Disaster Recovery Planning for the databases comes up, “Preventing & Recovering from Human Error” topic arises because human tends to make mistakes and engineers are humans :)  Having best DBAs around and with restrictions &  best practices/policies in-place,  you still can’t afford to ignore the fact – “an engineer could make mistake”.  For disaster recovery planning , this factor should be taken into consideration.

Fixing or recovering from human errors, specially inside databases, is costly , which is   accepted fact :(  Having better access, restrictions and good practices in-place could help to avoid some of the silly mistakes but you never know !! What if the Superuser (DBAs or anyone having super privileges) makes the mistake?

While dealing with databases, there are number of unknown and unimaginable ways, the engineer could make  mistakes, but I’m going to talk about couple of known and specific mistakes and how Postgres features could be used  to prevent those mistakes:

Preventing DROP DATABASE

Last week, Payal, one of my colleagues , asked me ; Is there anyway to prevent superuser from dropping  production database? My immediate response was: superuser(DBAs) can’t afford to make that mistake !!!!

She asked me another question; What if I convert production db to template database so it can _NOT_ be dropped even you are superuser? As you can see below, template database can not be dropped:

testdb=# drop database template1;
ERROR: cannot drop a template database

My  immediate response was;  hrmm.. there  could be security issue. The topic looked interesting to me so I started digging further into the topic on making production db to template database. As the Document suggests,  datistemplate can be set to indicate that a database is intended as a template for CREATE DATABASE. If this flag is set, the database can be cloned by any user with CREATEDB privileges; if it is not set, only superusers and the owner of the database can clone it.

As you might have realized now , there is a little risk because a user with CREATEDB can close the production database. Fortunately, CREATEDB is not default permission, you have to grant it explicitly.  The gain of preventing DROP DATABASE is much higher than underlying security issue. This is still experimental thought !! If you see any problems setting production db to template database, please shout out!  The below example converts the “proddb” to template database by setting datistemplate flag to true and try to drop the database using superuser:

testdb=# update pg_database set datistemplate=true where datname=’proddb';
UPDATE 1
testdb=# drop database proddb;
ERROR: cannot drop a template database

Preventing DROP TABLE

After coming up possible fix on DROP DATABASE, we decided to investigate further on similar ways of preventing dropping table accidentally !!

Actually, Postgres 9.3 introduced new feature “System level Trigger“. This can be used to prevent table drop in production. The below example shows how to prevent specific table to be dropped from the database:

Function to prevent single table drop

CREATE OR REPLACE FUNCTION confirm_drop()
RETURNS event_trigger
LANGUAGE plpgsql
AS $$
DECLARE
obj record;
BEGIN
FOR obj in SELECT * from pg_event_trigger_dropped_objects()
LOOP
IF obj.object_name = ‘users’ AND obj.schema_name = ‘testing’ THEN
RAISE EXCEPTION ‘command % is disabled for this table’, tg_tag;
END IF;
END LOOP;
END;
$$;

Place System Level Trigger

CREATE EVENT TRIGGER confirm_drop on sql_drop WHEN TAG IN (‘DROP TABLE’) EXECUTE PROCEDURE confirm_drop();

Let’s try to drop USERS table…

postgres=# drop table testing.users;
ERROR: command DROP TABLE is disabled for this table
STATEMENT: drop table testing.users;
ERROR: command DROP TABLE is disabled for this table

The above output shows even superuser can not drop the table. You could use similar approach to prevent TABLE DROP on SCHEMA or ALL TABLES in the production database.

However,  I can think off two possible downside of this approach ; (i) The system level trigger feature is not available until Postgres 9.3 (ii) You have to take the system level trigger into consideration while  actually dropping individual table or  any table in schema or databases.

I’d still rely on best practices for making sure the human errors can be avoided by putting required restrictions in place and following best practices for schema changes . Though, PostgreSQL allows us technical capability to prevent these mistakes by providing very low overhead technical solutions. Is it worth to take advantage of it? If you are planning on attending PGConf NYC or PGCon in Ottawa this year, let’s discuss in-person?

Thoughts/Comments/Concerns are welcome.

Postgres Security & PGConf NYC 2014

As per my knowledge, I haven’t heard or attended  Postgres conference with dedicated “Security” track . Rarely, you find one or two talks about security over couple of Postgres conferences but  never a dedicated track !  It is  indisputable to say that  Security is one of the most critical areas to consider for selecting  database for the application. I’m really excited  to attend and speak  about Postgres Security features at PGConf  NYC 2014 conference, which is going to be held on April 3-4, 2014 in New York and will have dedicated “security” track ! Hats off to organizers  :)

My employer, OmniTI, is one of the Gold sponsors for the event.  If you are planning to attend the conference, please use code “OMNITINYC14” at checkout, it will allow you to receive a 10% discount on both the Early Bird and Regular ticket prices – the code is valid through March 15th.

If you want to make friends in Postgres community and eager to learn  advance features  in Postgres & more specifically Security features,  you probably don’t want to miss out this rare opportunity!

Send me note, if you want to catchup in-person in NYC.  Looking forward to see you there !

Can Postgres 9.2 upgrade cause “Out of memory” error?

Earlier this week, we got an email from DonorsChoose.org with the need of “a serious Postgres expert” to solve the problem they have been experiencing and blocker for number of projects at hand. They heard good things about OmniTI from technology clubs and communities in NYC.

DonorsChoose.org is an online charity that makes it easy for anyone to help students in need. Public school teachers from every corner of America post classroom project requests on their site, and you can give any amount to the project that most inspires you.

This year, in beginning of July , they migrated Postgres database server from virtual hardware to high capacity bare-metal server and upgraded their databases from Postgres 8.2 to Postgres 9.2. As everyone hope after upgrade, website was much faster in response time and they should be happy after upgrading their database to Postgres 9.2. That is the case for them as well :)  Yes, they are happy Postgres user except some of the queries used to run without any issue  are causing Out of Memory errors now ! Sometimes, the queries were causing segmentation fault by Signal 11 :(

Weird, right ?

Here is the email received that describes the problem:

We’ve been happy Pg users for years now and have a pretty good command of what’s going on. We recently upgraded to 9.2.x and moved onto new hardware at the same time. Everything’s screaming fast as we’d hoped and working well, but… Now our most-intensive queries are failing with an “out of memorySQL state: 53200″ error. Not in production mind you, these are long-running queries we execute manually against a slave to infrequently do big exports.

On our old Pg version 8.2.x and much skimpier hardware, the job would take forever but complete, which was fine for the purpose. Now it’s on newer software with much more memory and CPU, but failing to complete. It seems to be failing on reports that use temporary tables that weren’t analyzed, and large queries during “hash join” and “merge join” operations. We’ve surely got something configured wrong, but we’ve been banging our heads against the wall and are out of ideas, eg. we’ve tried cranking work_mem way up, disabling hashjoin, no dice.

We requested to have conference call to get more details but we couldn’t able to attend conference call next day because of next day scheduled visit to NYC .  When we mentioned that we are visiting NYC tomorrow, they requested us, if we could stop by their Office ! We said yes because their office is on the way to OmniTI’s NYC office :)

In late evening, a day before visit, we sent out email to get some background and more details about environment so we could come up with some ideas before meeting:

1. OS version : uname -a
2. Postgres Version(please provide output of this query): select version();
3. Total Memory on server : free -m
4.  Actual error with query from pg_log. If you could provide log file with contents 1 hour before and after actual error, that will be helpful.
5. postgresql.conf file(s) for pre and post upgrade. Postgrs 8.2 and Postgres 9.2
6. Output of show commands in production  : show all;
7. Explain output of query on pg9.2 . If you have explain output from earlier Postgres 8.2 version available, that will be helpful as well.

While riding on the train next day, we received answers on the questions via email.
Excerpts from log file…

2013-10-08 18:24:43 EDT [13131]: [4-1] user=XXX,db=dc_query ERROR: out of memory
2013-10-08 18:24:43 EDT [13131]: [5-1] user=XXX,db=dc_query DETAIL: Failed on request of size 24.
2013-10-08 19:21:12 EDT [2001]: [2527-1] user=,db= LOG: server process (PID 17415) was terminated by signal 11: Segmentation fault
2013-10-08 19:21:12 EDT [2001]: [2528-1] user=,db= DETAIL: Failed process was running:

We analyzed answers and it looked like the problem characteristics kind of matches with the similar situation that we came across for other client.
Problem characteristics:

  1. Upgraded database from Postgres 8.2 to Postgres 9.2
  2. Query is failing with Out of Memory
  3. Explain plan is damn big ! The query is doing lots of joins !

The only but big difference was that we did not come across segmentation fault. We discussed about the problem internally in the team. Everyone kind of agree that it’s issue with max_locks_per_transaction. max_locks_per_transaction controls the average number of object locks allocated for each transaction; individual transactions can lock more objects as long as the locks of all transactions fit in the lock table. By default, max_locks_per_transaction is set to 64, which means that Postgres is prepared to track up to ( 64 X number of open transactions) locks. The problem is that Postgres 8.2 with same setting query runs without any issue but query fails with Out of Memory error on Postgres 9.2.  Not good, right?

After morning meetings in NYC, we visited DonorsChoose office and as we planned to test the first thing is to change max_locks_per_transaction parameter with higher value and restart database to reload the settings, we just gave it a try on QA database server.

You know what, the change solved the problem !

They couldn’t able to run query since they upgraded to Postgres 9.2, but now they could :)  While running the query, we noticed that the query required around 150 locks but the default value is only 64. The query is joining 50+ tables because it is data-warehosue query and having couple of unions!

You must be thinking that why the query worked on Postgres 8.2 but it doesn’t work on Postgres 9.2 ? My theory behind is that max_locks_per_transaction value wasn’t being honored until the implementation of true serialization isolation level and inclusion of max_pred_locks_per_transaction parameter in Postgres 9.1. If you know any concrete reason, I will be happy to learn more about it.

In any case, if you come across similar problem of “Out of Memory” error  after upgrading your database server to Postgres 9, you know the first thing to check is max_locks_per_transaction!

A week(s) of Conferences!

This week, I will be attending Surge (Scalability Conference) in Washington D.C. Next week, I will be speaking/attending  PgOpen (Postgres conference)  in Chicago. If you are planning to attend any of these conferences, it will be nice opportunity to meet in-person and catch up on technologies & stories specifically about databases !

See you soon!!

Postgres 9.2: monitoring temp files generation in real time

With other great new features, PostgreSQL 9.2 makes DBAs life easy by providing more metrics  in statistics collector views. Out of those , pg_stat_database caught my eyes because it provides a lot more details compare to Postgres 9.1 and other previous versions.

Postgres 9.1

postgres=# \d pg_stat_database
View "pg_catalog.pg_stat_database"
Column | Type | Modifiers
---------------+--------------------------+-----------
datid | oid |
datname | name |
numbackends | integer |
xact_commit | bigint |
xact_rollback | bigint |
blks_read | bigint |
blks_hit | bigint |
tup_returned | bigint |
tup_fetched | bigint |
tup_inserted | bigint |
tup_updated | bigint |
tup_deleted | bigint |
conflicts | bigint |
stats_reset | timestamp with time zone |

Postgres 9.2

postgres=# \d pg_stat_database
View "pg_catalog.pg_stat_database"
Column | Type | Modifiers
----------------+--------------------------+-----------
datid | oid |
datname | name |
numbackends | integer |
xact_commit | bigint |
xact_rollback | bigint |
blks_read | bigint |
blks_hit | bigint |
tup_returned | bigint |
tup_fetched | bigint |
tup_inserted | bigint |
tup_updated | bigint |
tup_deleted | bigint |
conflicts | bigint |
temp_files | bigint |
temp_bytes | bigint |
deadlocks | bigint |
blk_read_time | double precision |
blk_write_time | double precision |
stats_reset | timestamp with time zone |

As you have noticed above, pg_stat_database has number of  new columsn in  Postgres 9.2 compare to previous versions.  I’m here to discuss temp_files and temp_bytes columns.

Documentation is self explanatory :
                  temp_files : Number of temporary files created by queries in this database. All temporary files are counted, regardless of why the temporary file was created (e.g., sorting or hashing), and regardless of the log_temp_files setting.

                  temp_bytes: Total amount of data written to temporary files by queries in this database. All temporary files are counted, regardless of why the temporary file was created, and regardless of the log_temp_files setting.

postgres=# select datname,temp_files,temp_bytes from pg_stat_database where datname='XXX;
datname | temp_files | temp_bytes
-----------+------------+---------------
XXX | 107309 | 4650188504602
(1 rows)

As you are aware that temp_files generation could cause performance issues . Historically, it was possible to track temp files generation by enabling log_temp_files setting in postgresql.conf and analyze logs using pg_badger. Now, Postgres 9.2 allows you to monitor temp files generation in real time! It is trivial to add above query to start monitoring, trending and alerting using Circonus (See below graph) .

Left Y:  counter of number of files generated
Right Y: counter of temp files size (in MB) generated

temp