May 22

Inserting JSON data into Postgres using JDBC driver

EDIT:  Marcus(1st comment provider) helped me to write much cleaner and secure code. It doesn’t require CAST function and uses  PGobject with jdbc’s setObject. You could download updated code from git-repo. Thanks Marcus !!

One of the clients of OmniTI requested help to provide sample application to insert JSON data into Postgres using Java JDBC driver . I’m not Java expert so it took a while for me to write a simple java code to insert data. TBH, I took help to write test application from one of our Java engineers at OmniTI. Now, test application is ready and next step is to make it work with JSON datatype ! After struggling a little to find out work around for string escaping in JAVA code, I stumbled upon data type issue! Here is the test application code to connect to my local Postgres installation and insert JSON data into sample table:

postgres=# \d sample
Table "public.sample"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
data | json |
denishs-MacBook-Air-2:java denish$ java -cp $CLASSPATH PgJSONExample
-------- PostgreSQL JDBC Connection Testing ------------
PostgreSQL JDBC Driver Registered!
You made it, take control your database now!
Something exploded running the insert: ERROR: column "data" is of type json but expression is of type character varying
Hint: You will need to rewrite or cast the expression.
Position: 42

After some research , I found out that there is no standard JSON type on java side so adding support for json to postgres jdbc is not straight forward ! StackOverflow answer helped me for testing out the JSON datatype handling at psql level. As Craig mentioned in the answer that the correct way to solve this problem is to write a custom Java mapping type that uses the JDBC setObject method. This can be a tricky though.  A simpler workaround is to tell PostgreSQL to cast implicitly from text to json:
postgres=# create cast (text as json) without function as implicit;
CREATE CAST

The WITHOUT FUNCTION clause is used because text and json have the same on-disk and in-memory representation, they’re basically just aliases for the same data type. AS IMPLICIT tells PostgreSQL it can convert without being explicitly told to, allowing things like this to work:

postgres=# prepare test(text) as insert into sample (data) values ($1);
PREPARE
postgres=# execute test('{}');
INSERT 0 1
postgres=# select data from sample;
data
----
{}
(1 row)

Awesome ! That worked :-) Let’s try similar approach in Java application code.

denishs-MacBook-Air-2:java denish$ export CLASSPATH=/usr/share/postgresql/java/postgresql-9.2-1002.jdbc4.jar:
denishs-MacBook-Air-2:java denish$ javac -classpath $CLASSPATH PgJSONExample.java
denishs-MacBook-Air-2:java denish$ java -cp $CLASSPATH PgJSONExample
-------- PostgreSQL JDBC Connection Testing ------------
PostgreSQL JDBC Driver Registered!
You made it, take control your database now!
postgres=# select * from sample;
id | data
----+------------------------------------------------------------------------
1 | {"username":"denish","posts":10122,"emailaddress":"denish@omniti.com"}
(1 row)

Yay! It worked as well :-)

Next in my list to figure out installing PL/Java on Mac and/or Linux !! Let me know, if you have instructions for installation and test application using PL/Java.

Apr 20

When was the database created in Postgres cluster ?

Continuous Integration (CI)  using automated open source tools such as Jenkins and Hudson  is getting adoption rapidly. These tools help developers to gain confidence for creating more robust code rapidly by improving testing and QA process. The flexibility of these softwares add other challenges for the DBAs!

One of our client came across challenge to cleanup databases after X number of days from the Jenkins CI database because each run create seperate database and database names are not standard because they are provided by users. If they don’t cleanup old database, the cluster will have hundreds of databases at the end of the week. We tried to standardize database names but you can’t control users to make mistakes or input db names :-) On the other hand, Postgres’s system catalog view doesn’t provide database creation date.  How can I find out databases older than X days and drop them?

I came across this blog entry that answers my question but I was looking for easier way! I found easier way to get the database creation time with single query! Yay  :-) Following is the query that can be used to find the database creation time. The query should return correct created_date as long as you haven’t run pg_upgrade on the data directory.  I thought to share here so it will be useful for others!

SELECT datname, (pg_stat_file(‘base/’||oid||’/PG_VERSION’)).modification AS datcreated
FROM pg_database;

 

postgres=# SELECT datname, (pg_stat_file(‘base/’||oid||’/PG_VERSION’)).modification AS datcreated
postgres-# FROM pg_database;
datname | datcreated
————+————————
template1 | 2013-03-28 16:04:13-04
template0 | 2013-03-28 16:04:14-04
postgres | 2013-03-28 16:04:14-04
rangetypes | 2013-03-28 16:14:42-04
puppet | 2013-03-28 16:23:13-04
omniti | 2013-04-20 10:02:22-04
(6 rows)

 

Ideally,  pg_database system catalog view should include database_created timestamp ! Hopefully, that day will come sooner than later :-)

Feel free to comment, if you have any other ideas for getting this details or you see any corner cases with above query that I haven’t mentioned here  :-)

Dec 05

pg_repack in action!

Couple of years ago, I started compiling blog post on pg_reorg but that post never made it to published post because of my procrastination !! Though, I wasn’t disappointed because I got opportunity to talk about removing bloat from tables on databases at one of the PostgreSQL conference. Moreover, Depesz ,  colleague at OmniTI,  wrote a detailed post on how pg_reorg actually works :-)  So, you must be thinking , What’s up with this blog post ?

Let’s come to the point :-)

Last month, I received email in pg_reorg mailing list that first release of pg_repack beta1 was released ! So, What is pg_repack ? The first release of pg_repack is simply a fork of  pg_reorg. The author provided the reason of the fork is to revive the development of pg_reorg, which has been stagnated since the release of pg_reorg 1.1.7 in August 2011. That makes sense to me.  The first release doesn’t’ provide any new functionality but add specifically the missing features planned for pg_reorg 1.1.8 and fixes it’s known bugs. That’s good thing ! As I mentioned, it’s provide same functionality as pg_reorg but now you should follow more lively code of pg_repack instead of pg_reog on PGXN project page  :-)

At OmniTI, the engineers contribute new tools to community and use existing tools. We decided to give a swing at pg_repack because we have been using pg_reorg for last couple of years successfully on production systems and familiar with the code base. Now, pg_repack supports extension so it was pretty easy to install tool as extension on one of the PostgreSQL database system and ran it during low peak hours. The pg_repack run helped to trim down database size from 550GB to 400GB. Now a days,if no graph, it never happened :-) Here is the DB size graph for the reference. If you are curious, the graph was created using Circonus monitoring suite.

 

Thanks to pg_repack authors and contributors. If you haven’t join pg_reorg/pg_repack mailing lists, I would recommend you to join. Keep contributing and sharing the results to community!!

Happy Holidays!!