Category Archives: postgresql

Analyzing effect on default_statistics_target on ANALYZE

Last week, I received request to increase PostgreSQL’s config parameter default_statistics_target on one of the very large DB clusters.  The developers filled a ticket to increase the default_statistics_target to 10,000 at cluster level because one of the queries has started  full sequencial scan  instead of index scan.

By default, Postges cluster sets default_statistics_target to 100 but you can set up to 10,000. This parameter defines how many values are to be stored in the list of most common values, and also indicates the number of rows to be inspected by ANALYZE process.

Out of curiosity, I dig into Postgres ANALYZE code to figure out how many rows is being scanned by ANALYZE based on default_statistics_target  value. As you can see in line 1763, ANALYZE inspects rows up to multiple of 300 & default_statistics_target.

1763  stats->minrows = 300 * attr->attstattarget;

If the default_statistics_target value is 100, ANALYZE could scan up to 300 * 100= 30,000 rows to gather accurate statistics.

As per Postgres documentation on ANALYZE,  if you increase the value blindly at cluster level it could hurt the smaller queries because query planning time could increase significantly by setting the default_statistics_target value to maximum, which can eventually degrade the peformance for simple queries.  The next logical step for me to find out optimal value for the default_statistics_target for the specific table with slow query. I will probably start with increasing value to 1000 and increase/decrease value based on EXPLAIN ANALYZE query results.

When you try to tune default_statistics _target for your database server,  hopefully this finding will help. Let me know, if you have alternate views or idea for tuning the setting. Thanks!

Let’s hangout w/ Postgres people on Slack!

While casual discussions with Postgres users at  conference and/or online,  I came across this question for at least 3-4 times;  “Is there a Slack channel for Postgres?”  The answer was “No”.

The idea of bringing up Slack channel was cooking in my mind for a long time but i did not have time to execute it! Now, I have brought up Postgres Team Slack channel in end of Nov 2015 and there are already 100+ users in the channel. You can sign up here http://postgres-slack.herokuapp.com/

Following are the benefits you will get by connecting to Slack channel:

  • If you are using Slack for other communication, it will be easier for you to bring up another company/team on the Slack application.
  • Slack provides integration with number of services i.e Twitter with #postgres and #postgresql hashtags, postgres weekly news and @postgres Twitter handle messages
  • I’m yet to integrate Slack with Postgres gitrepo because I don’t have access to add post-commit hook on the gitrepo. Please email  me @ denish.j.patel@gmail.com, if you can help me to setup integration for Postgres gitrepo . I will provide instruction and web hook url to add into github repo settings.
  • You can access Slack from your mobile devices
  • If you aren’t connected, you still have history of the chat

Hope, this channel will be useful to drive more people to discuss postgres at scale!!

Disclaimer: 

If you are already using Postgres IRC channel, I’m not insisting you to use this new slack channel. I have come up with slack channel for mainly for slack users who are already using it for other purposes! Thanks for understanding.

Postgres Replication Slot & pg_basebackup

On #postgresql IRC channel, someone was having problem bringing up standby database server from backups taken using pg_basebackup and recovery.conf was configured to use replication slot.

The reporter was following  below steps:

  1. Set max_replication_slots=5, wal_keep_segments=0 and restart primary database
  2. create replication slot :
  3. SELECT * FROM pg_create_physical_replication_slot('standby')
  4. Call pg_basebackup -D – -Ft | bzip2 > backup.tar.bz2
  5. copy backup.tar.bz2 to standby server
  6. start postgresql on standby

The standby server was throwing following error during startup ..

 requested WAL segment 000000010000000C000000AE has already been removed.

Question from Reporter:

Should I set wal_keep_segments to some non-zero value if I want to use replication slots because while reading docs I understand that I shouldn’t set wal_keep_segments if I use max_replication_slots?

Short answer :

As of now, pg_basebackup doesn’t have support for replication slot so it doesn’t have details on status of WAL files being created or removed.

While the feature is being worked on, you can use one of the following solutions to get around the problem for now.

Solution 1:

On primay database server, set wal_keep_segments to some sane value relative to database traffic volume and time to make & transfer backup to bring up standby server. This setting will keep # of WAL files around on the primary server and will not delete them so when you bring up standby server using replication slot, it will start applying WAL files and eventually connect to physical slot created.

Solution 2:

Create slot before taking backup and use the slot at the least once so primary db server will keep the WAL files around.

  1. On primary db server, create slot
  2. connect to slot using pg_receivexlog
  3. /usr/pgsql-9.4/bin/pg_receivexlog -h primary-db-server -p 5432 -U replication –S 'standby1' -n -v -D /var/lib/pgsql/9.4/archive
  4. You can kill the pg_recievexlog process
  5. Take backup using pg_basebackup
  6. copy the backup and startup  standby database
  7. The standby database should be able to connect to slot now!
  8. Don’t forget to drop the slot otherwise Primary DB server will keep the WALs indefinitely

I would prefer Solution #2 over Solution #1 because you don’t have to figure out optimal value for wal_keep_segments. Plus, you can use pg_receivexlog method  for permanent archive solution. If you want to get more details, I will be presenting tutorial on replication topic “Out of the Box replication in Postgres 9.4” at Postgres Conference in San Francisco on Nov 17-18th. I will be provide hands on tutorial to setup robust replication for production db servers without using any third party tools in Postgres 9.4

If you haven’t registered, this is the time to register because the conference schedule looks amazing!  You could still register with the discount code SeeMeSpeak which provides a 20% discount.

Let me know, if you have better solutions than I discussed in the post. I’m looking forward to see you at the conference.

 

pgbadger log analysis for Postgres RDS instances

Last year, I have presented on Postgres RDS in Amazon at couple of conferences but I haven’t got change to follow up on my experience working with RDS after them. If you aren’t  aware, I recently changed job & started working at WithMe,  a disruptive start-up that focuses on three primary solutions for commerce, including WithMe Business, Financial and Retail.

At WithMe, we are using a fair bit of Postgres RDS instances. After joining, I reviewed the setup and found out the pgabadger log analysis is not setup on the database servers. Even though the databases are still in development phase, it makes sense to have reports that will help to provide more insight to software developers on the slow queries. I had other projects (schema(s) design, database migrations,DW system) going on so i wasn’t able to spend time identify and tuning queries to assist software engineers to expedite their develpement so we decided to prioritize log analysis project.

At previous job, I have setup pgbadger reports on number of servers but most of the time I had to deal with real db servers or at the least had ssh access on the server.  On the RDS, you can only have psql access, no shell level access on the instance.  Additionally, You have to deal with AWS tools and access methods.

I followed following steps to setup pgbadger reports for Postgres RDS instances:

  1. Brought up pgadmin ec2 instance to be used as utility server.
  2. Install AWS tool to interact with API
  3. Make sure /root/.aws/aws_credential_file file in place

    [denishpatel@pgadmin1 ~]$ aws configure –profile oregon
    AWS Access Key ID [None]: XXX
    AWS Secret Access Key [None]: XXX
    Default region name [None]: us-west-2
    Default output format [None]: text

    [denishpatel@pgadmin1 ~]$ aws rds describe-db-instances –profile oregon –output table

  4. Postgres logging parameters are set in parameter groups :

    log_min_duration_statement=5

    log_line_prefix=’%t:%r:%u@%d:[%p]:’

  5. I have come up with quick shell script to run reports. The  script is located @ https://github.com/denishpatel/pgbadger_on_rds
  6. You can easily cron them. As the new servers will be added, I will be adding new cron entries for the server:

    # pgbadger reports

    0 7 * * * ~/run_pgbadger_rds.sh -i devel-postgres -r us-west-2 -c 1 >/dev/null 2>&1
    15 7 * * * ~/run_pgbadger_rds.sh -i prod-postgres-rds -r us-east-1 -c 1 >/dev/null 2>&1

  7. I have setup nginx web server on pgadmin to server to serve html reports so team doesn’t have to download html files to review them.

In case, you are tasked to setup pgbadger report on Postgres RDS instance, this post might help ! I haven’t spent a lot of time writing a script so script provides very bare minimal to run reports  but it can have a lot of improvements. Hope to see your contribution to improve the script 🙂

As always, I’m looking forward to see some comments from your experience working with Postgres RDS 🙂

Postgres Replication using Replication Slots

Postgres 9.4 introduced an awesome feature Replication Slots. This allows you to implement Postgres replication without using any external archive management tools.

Yesterday, I presented three hours long tutorial on “Out of the box Postgres 9.4 Replication using Replication slots” at PgCon (Postgres conference) in Ottawa,Canada. If you want to follow along slides with VM, you can download it from the link posted on the conference tutorial schedule page.

Here is the slides deck:

Why should you consider upgrading Postgres again?

As title suggests,  if you are running Postgres in your environment, it is very important that you plan to upgrade Postgres with the latest announced release.

I upgraded Postgres recently in last couple of weeks. Should I upgrade again? Why?

Yes, you should plan to upgrade again.

Postgres released data corruption and security bug fix release on May 22nd and follow up release on June 4th to fix some of the issue introduced in previous release. However, the June 4th release did not fix the  multixact wraparound bugs (Thread1Thread2 ) introduced in recent versions.  The wraparound bug could cause problem starting Postgres after the crash , so it is critical to apply. The wraparound bug is more critical for 9.3 & 9.4 but it is very good idea to upgrade for other versions to make sure there isn’t any problem. Read the release notes for the further details/explanation.

Special care should be taken regarding vacuuming , if you ever ran Postgres 9.3 version and used pg_upgrade for the production databases.

What if I run Postgres 8.X (8.1,8.2,8.3.8.4) ?

Yes, absolutely!!  You are running EOL or unsupported version of Postgres. It is very important that you upgrade Postgres database to supported release  because it is most likely to hit any or all of the  security and/or data loss bugs fixed in recent Postgres releases.

What if I run Postgres 9.X  but have not applied the the latest minor release?

Yes, you should upgrade to the latest minor release (3rd digit in version numbering). The minor release only requires installing new binaries and restart the database.  It does _not_ require to use pg_dump/restore or pg_upgrade.

You should be upgrading to the following latest relevant Postgres release for your environment ASAP:

  • 9.4.4
  • 9.3.9
  • 9.2.13
  • 9.1.18
  • 9.0.22
How can I keep track of the supported Postgres releases?

Postgres releases support can be found here. It is worth to note that , if you are running Postgres 9.0,  you should plan for major version upgrade (most likely to latest Postgres 9.4.X release)  before the holidays season begin because  Postgres 9.0.X will be EOL or unsupported by September,2015.

Hope this will help to convince your manager(s) for the Postgres database upgrade!

Preventing Human Errors in Postgres

Whenever discussion on Disaster Recovery Planning for the databases comes up, “Preventing & Recovering from Human Error” topic arises because human tends to make mistakes and engineers are humans 🙂  Having best DBAs around and with restrictions &  best practices/policies in-place,  you still can’t afford to ignore the fact – “an engineer could make mistake”.  For disaster recovery planning , this factor should be taken into consideration.

Fixing or recovering from human errors, specially inside databases, is costly , which is   accepted fact 🙁  Having better access, restrictions and good practices in-place could help to avoid some of the silly mistakes but you never know !! What if the Superuser (DBAs or anyone having super privileges) makes the mistake?

While dealing with databases, there are number of unknown and unimaginable ways, the engineer could make  mistakes, but I’m going to talk about couple of known and specific mistakes and how Postgres features could be used  to prevent those mistakes:

Preventing DROP DATABASE

Last week, Payal, one of my colleagues , asked me ; Is there anyway to prevent superuser from dropping  production database? My immediate response was: superuser(DBAs) can’t afford to make that mistake !!!!

She asked me another question; What if I convert production db to template database so it can _NOT_ be dropped even you are superuser? As you can see below, template database can not be dropped:

testdb=# drop database template1;
ERROR: cannot drop a template database

My  immediate response was;  hrmm.. there  could be security issue. The topic looked interesting to me so I started digging further into the topic on making production db to template database. As the Document suggests,  datistemplate can be set to indicate that a database is intended as a template for CREATE DATABASE. If this flag is set, the database can be cloned by any user with CREATEDB privileges; if it is not set, only superusers and the owner of the database can clone it.

As you might have realized now , there is a little risk because a user with CREATEDB can close the production database. Fortunately, CREATEDB is not default permission, you have to grant it explicitly.  The gain of preventing DROP DATABASE is much higher than underlying security issue. This is still experimental thought !! If you see any problems setting production db to template database, please shout out!  The below example converts the “proddb” to template database by setting datistemplate flag to true and try to drop the database using superuser:

testdb=# update pg_database set datistemplate=true where datname=’proddb’;
UPDATE 1
testdb=# drop database proddb;
ERROR: cannot drop a template database

Preventing DROP TABLE

After coming up possible fix on DROP DATABASE, we decided to investigate further on similar ways of preventing dropping table accidentally !!

Actually, Postgres 9.3 introduced new feature “System level Trigger“. This can be used to prevent table drop in production. The below example shows how to prevent specific table to be dropped from the database:

Function to prevent single table drop

CREATE OR REPLACE FUNCTION confirm_drop()
RETURNS event_trigger
LANGUAGE plpgsql
AS $$
DECLARE
obj record;
BEGIN
FOR obj in SELECT * from pg_event_trigger_dropped_objects()
LOOP
IF obj.object_name = ‘users’ AND obj.schema_name = ‘testing’ THEN
RAISE EXCEPTION ‘command % is disabled for this table’, tg_tag;
END IF;
END LOOP;
END;
$$;

Place System Level Trigger

CREATE EVENT TRIGGER confirm_drop on sql_drop WHEN TAG IN (‘DROP TABLE’) EXECUTE PROCEDURE confirm_drop();

Let’s try to drop USERS table…

postgres=# drop table testing.users;
ERROR: command DROP TABLE is disabled for this table
STATEMENT: drop table testing.users;
ERROR: command DROP TABLE is disabled for this table

The above output shows even superuser can not drop the table. You could use similar approach to prevent TABLE DROP on SCHEMA or ALL TABLES in the production database.

However,  I can think off two possible downside of this approach ; (i) The system level trigger feature is not available until Postgres 9.3 (ii) You have to take the system level trigger into consideration while  actually dropping individual table or  any table in schema or databases.

I’d still rely on best practices for making sure the human errors can be avoided by putting required restrictions in place and following best practices for schema changes . Though, PostgreSQL allows us technical capability to prevent these mistakes by providing very low overhead technical solutions. Is it worth to take advantage of it? If you are planning on attending PGConf NYC or PGCon in Ottawa this year, let’s discuss in-person?

Thoughts/Comments/Concerns are welcome.

Postgres Security & PGConf NYC 2014

As per my knowledge, I haven’t heard or attended  Postgres conference with dedicated “Security” track . Rarely, you find one or two talks about security over couple of Postgres conferences but  never a dedicated track !  It is  indisputable to say that  Security is one of the most critical areas to consider for selecting  database for the application. I’m really excited  to attend and speak  about Postgres Security features at PGConf  NYC 2014 conference, which is going to be held on April 3-4, 2014 in New York and will have dedicated “security” track ! Hats off to organizers  🙂

My employer, OmniTI, is one of the Gold sponsors for the event.  If you are planning to attend the conference, please use code “OMNITINYC14” at checkout, it will allow you to receive a 10% discount on both the Early Bird and Regular ticket prices – the code is valid through March 15th.

If you want to make friends in Postgres community and eager to learn  advance features  in Postgres & more specifically Security features,  you probably don’t want to miss out this rare opportunity!

Send me note, if you want to catchup in-person in NYC.  Looking forward to see you there !

Can Postgres 9.2 upgrade cause “Out of memory” error?

Earlier this week, we got an email from DonorsChoose.org with the need of “a serious Postgres expert” to solve the problem they have been experiencing and blocker for number of projects at hand. They heard good things about OmniTI from technology clubs and communities in NYC.

DonorsChoose.org is an online charity that makes it easy for anyone to help students in need. Public school teachers from every corner of America post classroom project requests on their site, and you can give any amount to the project that most inspires you.

This year, in beginning of July , they migrated Postgres database server from virtual hardware to high capacity bare-metal server and upgraded their databases from Postgres 8.2 to Postgres 9.2. As everyone hope after upgrade, website was much faster in response time and they should be happy after upgrading their database to Postgres 9.2. That is the case for them as well :)  Yes, they are happy Postgres user except some of the queries used to run without any issue  are causing Out of Memory errors now ! Sometimes, the queries were causing segmentation fault by Signal 11 :(

Weird, right ?

Here is the email received that describes the problem:

We’ve been happy Pg users for years now and have a pretty good command of what’s going on. We recently upgraded to 9.2.x and moved onto new hardware at the same time. Everything’s screaming fast as we’d hoped and working well, but… Now our most-intensive queries are failing with an “out of memorySQL state: 53200″ error. Not in production mind you, these are long-running queries we execute manually against a slave to infrequently do big exports.

On our old Pg version 8.2.x and much skimpier hardware, the job would take forever but complete, which was fine for the purpose. Now it’s on newer software with much more memory and CPU, but failing to complete. It seems to be failing on reports that use temporary tables that weren’t analyzed, and large queries during “hash join” and “merge join” operations. We’ve surely got something configured wrong, but we’ve been banging our heads against the wall and are out of ideas, eg. we’ve tried cranking work_mem way up, disabling hashjoin, no dice.

We requested to have conference call to get more details but we couldn’t able to attend conference call next day because of next day scheduled visit to NYC .  When we mentioned that we are visiting NYC tomorrow, they requested us, if we could stop by their Office ! We said yes because their office is on the way to OmniTI’s NYC office :)

In late evening, a day before visit, we sent out email to get some background and more details about environment so we could come up with some ideas before meeting:

1. OS version : uname -a
2. Postgres Version(please provide output of this query): select version();
3. Total Memory on server : free -m
4.  Actual error with query from pg_log. If you could provide log file with contents 1 hour before and after actual error, that will be helpful.
5. postgresql.conf file(s) for pre and post upgrade. Postgrs 8.2 and Postgres 9.2
6. Output of show commands in production  : show all;
7. Explain output of query on pg9.2 . If you have explain output from earlier Postgres 8.2 version available, that will be helpful as well.

While riding on the train next day, we received answers on the questions via email.
Excerpts from log file…

2013-10-08 18:24:43 EDT [13131]: [4-1] user=XXX,db=dc_query ERROR: out of memory
2013-10-08 18:24:43 EDT [13131]: [5-1] user=XXX,db=dc_query DETAIL: Failed on request of size 24.
2013-10-08 19:21:12 EDT [2001]: [2527-1] user=,db= LOG: server process (PID 17415) was terminated by signal 11: Segmentation fault
2013-10-08 19:21:12 EDT [2001]: [2528-1] user=,db= DETAIL: Failed process was running:

We analyzed answers and it looked like the problem characteristics kind of matches with the similar situation that we came across for other client.
Problem characteristics:

  1. Upgraded database from Postgres 8.2 to Postgres 9.2
  2. Query is failing with Out of Memory
  3. Explain plan is damn big ! The query is doing lots of joins !

The only but big difference was that we did not come across segmentation fault. We discussed about the problem internally in the team. Everyone kind of agree that it’s issue with max_locks_per_transaction. max_locks_per_transaction controls the average number of object locks allocated for each transaction; individual transactions can lock more objects as long as the locks of all transactions fit in the lock table. By default, max_locks_per_transaction is set to 64, which means that Postgres is prepared to track up to ( 64 X number of open transactions) locks. The problem is that Postgres 8.2 with same setting query runs without any issue but query fails with Out of Memory error on Postgres 9.2.  Not good, right?

After morning meetings in NYC, we visited DonorsChoose office and as we planned to test the first thing is to change max_locks_per_transaction parameter with higher value and restart database to reload the settings, we just gave it a try on QA database server.

You know what, the change solved the problem !

They couldn’t able to run query since they upgraded to Postgres 9.2, but now they could :)  While running the query, we noticed that the query required around 150 locks but the default value is only 64. The query is joining 50+ tables because it is data-warehosue query and having couple of unions!

You must be thinking that why the query worked on Postgres 8.2 but it doesn’t work on Postgres 9.2 ? My theory behind is that max_locks_per_transaction value wasn’t being honored until the implementation of true serialization isolation level and inclusion of max_pred_locks_per_transaction parameter in Postgres 9.1. If you know any concrete reason, I will be happy to learn more about it.

In any case, if you come across similar problem of “Out of Memory” error  after upgrading your database server to Postgres 9, you know the first thing to check is max_locks_per_transaction!