All posts by Denish

Postgres 11 partitioning

Postgres supported table partitioning  implementation based on inheritance and triggers for over more than a decade now. However, the declarative partition support was added in Postgres 10 release in Oct 2017.  Since Postgres 10, Postgres  supports built-in declarative partitioning so it was easier to create partitions but you still need to manage trigger to update records on parent table. Additionally, you couldn’t able to add Primary Key and Foreign Keys on partitioned tables. The recent release of Postgres 11 solves all of these problems.

Postgres 11 adds a lot more partitioning features to manage partitioned tables easier than ever! Below is the comparison of partitioning features across Postgres releases:

feature Postgres – 11 postgres -10 9.6
Declarative table partitioning Yes Yes No
Default Partition –

A default partition stores data that does not match the partition key for any other partition

Yes No No
Partitioning by a HASH key Yes No No
Support for PRIMARY KEY, FOREIGN KEY, indexes, and triggers on partitioned tables Yes No No
UPDATE on a partition key –

When a partition key is updated on a row, the row is moved to the appropriate partition.

Yes No No

Postgres 11 supports RANGE, LIST and HASH partition types. You can also create sub-partitions  on child tables too!

Let’s take an example to partition the table using RANGE and LIST partition types.

RANGE Partitioning:
 -- create parent table to store SMS campaign subscribers 
app=# CREATE TABLE sms_campaign_subscribers (id bigint not null, sms_campaign_id bigint not null) PARTITION BY RANGE (sms_campaign_id);
CREATE TABLE

-- create child table to store campaign with sms_campaign_id >= 111 and < 112
app=# CREATE TABLE sms_campaign_subscriber_111 PARTITION OF sms_campaign_subscribers FOR VALUES FROM (111) TO (112) ;
CREATE TABLE

-- Describe parent table
app=# \d+ sms_campaign_subscribers
 Table "public.sms_campaign_subscribers"
 Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------------+--------+-----------+----------+---------+---------+--------------+-------------
 id | bigint | | not null | | plain | |
 sms_campaign_id | bigint | | not null | | plain | |
Partition key: RANGE (sms_campaign_id)
Partitions: sms_campaign_subscriber_111 FOR VALUES FROM ('111') TO ('112')

-- Describe child table

app=# \d+ sms_campaign_subscriber_111
 Table "public.sms_campaign_subscriber_111"
 Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------------+--------+-----------+----------+---------+---------+--------------+-------------
 id | bigint | | not null | | plain | |
 sms_campaign_id | bigint | | not null | | plain | |
Partition of: sms_campaign_subscribers FOR VALUES FROM ('111') TO ('112')
Partition constraint: ((sms_campaign_id IS NOT NULL) AND (sms_campaign_id >= '111'::bigint) AND (sms_campaign_id < '112'::bigint))
 -- insert into parent table
app=# insert into sms_campaign_subscribers values(1,'111');
INSERT 0 1
 -- query parent table
app=# select * from sms_campaign_subscribers;
 id | sms_campaign_id
----+-----------------
 1 | 111
(1 row)

-- you can also insert directly into child table

app=# insert into sms_campaign_subscriber_111 values(2,'111');
INSERT 0 1

-- query parent table
app=# select * from sms_campaign_subscribers;
 id | sms_campaign_id
----+-----------------
 1 | 111
 2 | 111
(2 rows)

-- query directly child table
app=# select * from sms_campaign_subscriber_111;
 id | sms_campaign_id
----+-----------------
 1 | 111
 2 | 111
(2 rows)

 

Problem:

While working on partitioning, I was stumbled upon below syntax issue…

app=# CREATE TABLE sms_campaign_subscribers (id bigint not null, sms_campaign_id bigint not null) PARTITION BY RANGE (sms_campaign_id);
CREATE TABLE
app=# CREATE TABLE sms_campaign_subscriber_111 PARTITION OF sms_campaign_subscribers FOR VALUES FROM (111) TO (112) PARTITION BY RANGE(sms_campaign_id);
CREATE TABLE
app=# \d sms_campaign_subscribers
 Table "public.sms_campaign_subscribers"
 Column | Type | Collation | Nullable | Default
-----------------+--------+-----------+----------+---------
 id | bigint | | not null |
 sms_campaign_id | bigint | | not null |
Partition key: RANGE (sms_campaign_id)
Number of partitions: 1 (Use \d+ to list them.)
app=# \d sms_campaign_subscriber_111
 Table "public.sms_campaign_subscriber_111"
 Column | Type | Collation | Nullable | Default
-----------------+--------+-----------+----------+---------
 id | bigint | | not null |
 sms_campaign_id | bigint | | not null |
Partition of: sms_campaign_subscribers FOR VALUES FROM ('111') TO ('112')
Partition key: RANGE (sms_campaign_id)

The insert was failing …

 -- insert into parent table
app=# insert into sms_campaign_subscribers values(1,'111');
ERROR: no partition of relation "sms_campaign_subscriber_111" found for row
DETAIL: Partition key of the failing row contains (sms_campaign_id) = (111).
 
-- insert into parent table
app=# insert into sms_campaign_subscriber_111 values(1,111);
ERROR: no partition of relation "sms_campaign_subscriber_111" found for row
DETAIL: Partition key of the failing row contains (sms_campaign_id) = (111).
Solution:
The problem was with CHILD table creation statement below…
CREATE TABLE sms_campaign_subscriber_111 PARTITION OF sms_campaign_subscribers FOR VALUES FROM (111) TO (112) PARTITION BY RANGE(sms_campaign_id);
In above statement , I was actually creating sub-partition on the child table hence there was no child table for parent table.. Once I get rid of PARTITION BY RANGE(sms_campaign_id) from CHILD table creation statement, the things worked as expected. Thanks to Keith to help me identify the issue.
LIST Partitioning:

In my case, it is actually make sense to use LIST partition instead of RANGE partition.

-- drop table if exists
app=# drop table if exists sms_campaign_subscribers;
DROP TABLE
-- create parent table
app=# CREATE TABLE sms_campaign_subscribers (id bigint not null, sms_campaign_id bigint not null) PARTITION BY LIST (sms_campaign_id);
CREATE TABLE

-- create child table
app=# CREATE TABLE sms_campaign_subscriber_111 PARTITION OF sms_campaign_subscribers FOR VALUES IN (111) ;
CREATE TABLE
app=# \d+ sms_campaign_subscribers
 Table "public.sms_campaign_subscribers"
 Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------------+--------+-----------+----------+---------+---------+--------------+-------------
 id | bigint | | not null | | plain | |
 sms_campaign_id | bigint | | not null | | plain | |
Partition key: LIST (sms_campaign_id)
Partitions: sms_campaign_subscriber_111 FOR VALUES IN ('111')

-- describe tables
app=# \d+ sms_campaign_subscribers
 Table "public.sms_campaign_subscribers"
 Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-----------------+--------+-----------+----------+---------+---------+--------------+-------------
 id | bigint | | not null | | plain | |
 sms_campaign_id | bigint | | not null | | plain | |
Partition key: LIST (sms_campaign_id)
Partitions: sms_campaign_subscriber_111 FOR VALUES IN ('111')

app=# insert into sms_campaign_subscribers values(1,'111');
INSERT 0 1
app=# insert into sms_campaign_subscribers values(2,'111');
INSERT 0 1
app=# select * from sms_campaign_subscribers;
 id | sms_campaign_id
----+-----------------
 1 | 111
 2 | 111
(2 rows)

Let’s try to insert row outside of defined partition key range.. it will fail!

app=# insert into sms_campaign_subscribers values (3,113);
ERROR: no partition of relation "sms_campaign_subscribers" found for row
DETAIL: Partition key of the failing row contains (sms_campaign_id) = (113).

Let’s make use of Postgres 11 feature “default partition” to store data if it doesn’t fall into specified partition range…

app=# CREATE TABLE sms_campaign_subscriber_default PARTITION OF sms_campaign_subscribers DEFAULT ;
CREATE TABLE
app=# insert into sms_campaign_subscribers values (3,113);
INSERT 0 1
app=# select * from sms_campaign_subscribers;
 id | sms_campaign_id
----+-----------------
 1 | 111
 2 | 111
 3 | 113
(3 rows)

Yay. Everything seems to be working 🙂

I went through basic example but you can have PK and FK constraint on partitioned table or even create sub-partitions in Postgres 11. I will explore them in future posts.

I hope this write will help you to get started on exploring partitioning feature in Postgres.

Audit logging with Postgres partitioning

As I mentioned in my previous post  “Audit logging using JSONB in Postgres” , audit tables can be partitioned easily in Postgres 10.

Let’s use Postgres partitioning in sample users_audit table…

Drop existing trigger on live table and users_audit table

drop trigger users_audit_trig ON public.users;
drop table if exists audit.users_audit;

Create partition table by RANGE partition on audit_ts timestamp column …

set search_path to audit;

create table audit.users_audit(
audit_ts timestamptz not null default now(),
operation varchar(10)not null,
username text not null default "current_user"(),
before jsonb, 
after jsonb
) partition by RANGE (audit_ts);

Create child tables…

CREATE TABLE audit.users_audit_2018_07 PARTITION OF audit.users_audit FOR VALUES FROM ('2018-07-01') TO ('2018-08-01');
CREATE TABLE audit.users_audit_2018_08 PARTITION OF audit.users_audit FOR VALUES FROM ('2018-08-01') TO ('2018-09-01');
CREATE TABLE audit.users_audit_2018_09 PARTITION OF audit.users_audit FOR VALUES FROM ('2018-09-01') TO ('2018-10-01');
CREATE TABLE audit.users_audit_2018_10 PARTITION OF audit.users_audit FOR VALUES FROM ('2018-10-01') TO ('2018-11-01');
CREATE TABLE audit.users_audit_2018_11 PARTITION OF audit.users_audit FOR VALUES FROM ('2018-11-01') TO ('2018-12-01');
CREATE TABLE audit.users_audit_2018_12 PARTITION OF audit.users_audit FOR VALUES FROM ('2018-12-01') TO ('2019-01-01');

Create required index on EACH partitioned table..

create index on audit.users_audit_2018_07 (audit_ts desc,operation);
create index on audit.users_audit_2018_07 using GIN(before);
create index on audit.users_audit_2018_07 using GIN(after);
create index on audit.users_audit_2018_07 using GIN ((after->'userid'));

NOTE: you have to pre-create these child tables as well indices in advance so you can come up with process to create them using some kind of script or add trigger on parent partitioned table to create child table automatically.

Place the trigger back on USERS table..

CREATE TRIGGER users_audit_trig
 BEFORE INSERT OR UPDATE OR DELETE
 ON public.users
 FOR EACH ROW
 EXECUTE PROCEDURE public.users_audit_trig();

Update the record for userid=101

app=# select * from public.users where userid=101;
-[ RECORD 1 ]-+-------------------------
userid | 101
username | resslement2s
first_name | Rudiger
last_name | Esslement
city | Baltimore
state | Maryland
email | resslement2s@auda.org.au
phone | 915-528-7033
is_like_sport | t

app=# update public.users set first_name='Denish', last_name='Patel' where userid=101;
UPDATE 1

Let’s query users_audit table..

app=# select * from audit.users_audit;
-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
audit_ts | 2018-07-09 14:01:39.373666-04
operation | UPDATE
username | denishpatel
before | {"city": "Baltimore", "email": "resslement2s@auda.org.au", "phone": "915-528-7033", "state": "Maryland", "userid": 101, "username": "resslement2s", "last_name": "Esslement", "first_name": "Rudiger", "is_like_sport": true}
after | {"city": "Baltimore", "email": "resslement2s@auda.org.au", "phone": "915-528-7033", "state": "Maryland", "userid": 101, "username": "resslement2s", "last_name": "Patel", "first_name": "Denish", "is_like_sport": true}

As you can see below the record is added to ONLY 2018_07 child table…

app=# select * from audit.users_audit_2018_08;
 (0 rows)
app=# select * from audit.users_audit_2018_07;
-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
audit_ts | 2018-07-09 14:01:39.373666-04
operation | UPDATE
username | denishpatel
before | {"city": "Baltimore", "email": "resslement2s@auda.org.au", "phone": "915-528-7033", "state": "Maryland", "userid": 101, "username": "resslement2s", "last_name": "Esslement", "first_name": "Rudiger", "is_like_sport": true}
after | {"city": "Baltimore", "email": "resslement2s@auda.org.au", "phone": "915-528-7033", "state": "Maryland", "userid": 101, "username": "resslement2s", "last_name": "Patel", "first_name": "Denish", "is_like_sport": true}

Make sure constraint_exclusion=partition in postgresql.conf so query can use partition pruning. Below query grabs data from single partitioned table because WHERE clause includes partitioned_key (audit_ts).

app=# explain analyze select * from audit.users_audit where audit_ts >= '2018-07-01' and audit_ts < '2018-08-01' and after->>'userid'='101';
 QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Append (cost=4.17..9.52 rows=1 width=142) (actual time=0.010..0.011 rows=1 loops=1)
 -> Bitmap Heap Scan on users_audit_2018_07 (cost=4.17..9.52 rows=1 width=142) (actual time=0.010..0.010 rows=1 loops=1)
 Recheck Cond: ((audit_ts >= '2018-07-01 00:00:00-04'::timestamp with time zone) AND (audit_ts < '2018-08-01 00:00:00-04'::timestamp with time zone))
 Filter: ((after ->> 'userid'::text) = '101'::text)
 Heap Blocks: exact=1
 -> Bitmap Index Scan on users_audit_2018_07_audit_ts_operation_idx (cost=0.00..4.17 rows=2 width=0) (actual time=0.004..0.004 rows=1 loops=1)
 Index Cond: ((audit_ts >= '2018-07-01 00:00:00-04'::timestamp with time zone) AND (audit_ts < '2018-08-01 00:00:00-04'::timestamp with time zone))
 Planning time: 0.239 ms
 Execution time: 0.032 ms
(9 rows)

As you can see, it’s very easy to use partitioning in Postgres 10. In the next Postgres 11 release comes with even better features for partitioning.

Stay tuned!

Audit logging using JSONB in Postgres

Recently, someone reached out to me asking “what’s the best way to achieve database DML auditing in Postgres?”

I have suggested a couple of options below to achieve DML auditing in Postgres:

  1. Audit using Postgres logs. Postgres allows to keep track of DML statements at database level by enabling auditing directly in the postgres logs. You can use https://github.com/pgaudit/pgaudit extension to make the audit logging consistent and readable.
  2. Audit tables using trigger based approach by creating audit schema on live database and keep the audit tables updated through trigger.

Both of these approaches have pros and cons.

If you are looking for detailed auditing (including SELECT) at database level, you can use pgaudit extension. However, you will have to deal on how to make auditing data queryable for end users. Additionally, you have to enable at database level instead of specific tables.

On the other hand, if you are only concerned about auditing DML and for specific tables and even further if you want to optimize at column level, trigger based approach is your answer. However, you have to deal with audit schema growth . However, audit tables are readily available to query without any further processing or tools.  If you are using exact schema for audit tables, the trigger based approach requires to change schema on audit tables when you change live schema so there will be additional overhead managing audit schema. However, if you use Postgres JSONB column to keep track of auditing, you can come over overhead of  schema changes on audit tables with live tables.

In this post, I have explored a way to use JSONB data type to store auditing details using trigger based approach.

Let’s create sample USERS table:

create schema if not exists public;

create table public.users(
userid serial primary key,
username text not null,
first_name text not null,
last_name text not null,
city varchar(30) not null,
state varchar(30) not null,
email text not null,
phone varchar(30),
is_like_sport boolean default false
);

Create audit schema and table to keep track of changes in USERS_AUDIT table.  As you can see below, the before and after columns are JSONB.

create schema if not exists audit;

create table audit.users_audit(
audit_ts timestamptz not null default now(),
operation varchar(10)not null,
username text not null default "current_user"(),
before jsonb, 
after jsonb
);

Create a trigger function and trigger on USERS table to keep track of audit changes

CREATE OR REPLACE FUNCTION public.users_audit_trig()
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$
begin

IF TG_OP = 'INSERT'
THEN
INSERT INTO audit.users_audit (operation, after)
VALUES (TG_OP, to_jsonb(NEW));
RETURN NEW;

ELSIF TG_OP = 'UPDATE'
THEN
IF NEW != OLD THEN
 INSERT INTO audit.users_audit (operation, before, after)
VALUES (TG_OP, to_jsonb(OLD), to_jsonb(NEW));
END IF;
 RETURN NEW;

ELSIF TG_OP = 'DELETE'
THEN
INSERT INTO audit.users_audit (operation, before)
VALUES (TG_OP, to_jsonb(OLD));
RETURN OLD;
END IF;
end;
$function$ ;

CREATE TRIGGER users_audit_trig
 BEFORE INSERT OR UPDATE OR DELETE
 ON public.users
 FOR EACH ROW
 EXECUTE PROCEDURE public.users_audit_trig();

I have used https://mockaroo.com/ for generating mock data for users table.

mockroo users data

Copy mock data into users table

app=# copy users from '/usr/local/var/postgres/mockdata/MOCK_DATA.csv' with header CSV;
COPY 1000

To see how Mockroo is generating mock data, let’s query userid=101

app=# select * from users where userid=101;
-[ RECORD 1 ]-+-------------------------
userid | 101
username | resslement2s
first_name | Rudiger
last_name | Esslement
city | El Paso
state | Texas
email | resslement2s@auda.org.au
phone | 915-528-7033
is_like_sport | t

The audit table for userid=101 row looks like …

app=# select * from audit.users_audit where after->>'userid'='101';
-[ RECORD 1 ]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
audit_ts | 2018-07-05 14:39:06.960812-04
operation | INSERT
username | denishpatel
before |
after | {"city": "El Paso", "email": "resslement2s@auda.org.au", "phone": "915-528-7033", "state": "Texas", "userid": 101, "username": "resslement2s", "last_name": "Esslement", "first_name": "Rudiger", "is_like_sport": true}

INSERT statement audit is straight forward to query.

Let’s update city and state column where userid=101

app=# update users set city='Baltimore',state='Maryland' where userid=101;
UPDATE 1

You want to see only changed columns from before and after columns. Let’s create utility function  jsonb_diff function to show difference between two JSONB values..

CREATE OR REPLACE FUNCTION audit.jsonb_diff(l JSONB, r JSONB) RETURNS JSONB AS
$json_diff$
 SELECT jsonb_object_agg(a.key, a.value) FROM
 ( SELECT key, value FROM jsonb_each(l) ) a LEFT OUTER JOIN
 ( SELECT key, value FROM jsonb_each(r) ) b ON a.key = b.key
 WHERE a.value != b.value OR b.key IS NULL;
$json_diff$
 LANGUAGE sql;

Now, query audit table to see UPDATED values..

app=# select after->>'userid' as userid , audit.jsonb_diff(before,after) as before_change , audit.jsonb_diff(after,before) as after_change from audit.users_audit where operation='UPDATE';
-[ RECORD 1 ]-+--------------------------------------------
userid | 101
before_change | {"city": "El Paso", "state": "Texas"}
after_change | {"city": "Baltimore", "state": "Maryland"}

Postgres allows to create indexing on entire JSONB columns as well to specific key in the JSONB.

let’s create some useful indices..

app=# create index idx_users_audit_audit_ts_operation on audit.users_audit (audit_ts desc,operation);
CREATE INDEX
app=# create index idx_gin_users_audit_before on audit.users_audit using GIN(before);
CREATE INDEX
-- create index on entire after JSONB object
app=# create index idx_gin_users_audit_after on audit.users_audit using GIN(after);
CREATE INDEX
-- create index on userid key from after object
app=# create index idx_gin_users_audit_after_userid on audit.users_audit using GIN ((after->'userid'));
CREATE INDEX

The audit tables grows really fast so you want to partition audit tables. Postgres 10 makes table partitioning much easier to manage so you should convert audit tables into partition tables to keep only X months of data in production database and archive rest of partitioned table off to cheaper storage i.e Amazon S3 or Glacier.  The upcoming Postgres 11 release includes with even more partitioning performance features so they will be helpful to keep up DB performance with users and database growth.

Hope it will be helpful to someone architecting audit schema in Postgres.

Any suggestions/ comments welcome.

Connecting Postgres to Active Directory for Authentication

I have connected our Postgres instance to Active Directory for authentication because I didn’t want to manage passwords for hundreds for users accessing mission critical databases. The authentication is being performed by Active Directory but authorization (managing groups, permissions at table/column/row level) is still be handled by Postgres. The biggest advantage of connecting Postgres to AD for real users, I don’t have to worry on password and password policies because the policies are handled by AD. On the other hand,  users database authentication will be dependent on AD infrastructure. Please make sure AD infrastructure is redundant enough with low latency connection  for your organization before making this decision.

I have come up with simple steps to connect Postgres to Active Directory. TBH, If you can get serviceAccount to be used to connect to organization’s active directory, the rest of the setup is fairly straight forward.

Step-1 :

Get and active directory server name and test connection from Postgres server :

postgres-dbserver$ nc -v ldap-server.example.com 389
Connection to ldap-server.example.com 389 port [tcp/ldap] succeeded!
Step-2 :

Install ldap-utils package so you can use ldap command line tools to test connections.

yum install ldap-utils
or
apt-get install ldap-utils
Step-3 :

Ask your AD/LDAP admins to get full path of OU and CN for your organization and construct ldapsearch query. I’m using sAMAccountName as username but you can use other attributes for username i.e uid

ldapsearch -v -x -H ldap://ldap-server.example.com -b "OU=Users,OU=local,DC=example,DC=com" -D "CN=service-account-username,OU=ServiceAccounts,OU=Users,OU=local,DC=example,DC=com" -W "(sAMAccountName=dpatel)"

If you are using uid for the validation,  your search query will look like this …

ldapsearch -v -x -H ldap://ldap-server.example.com -b "OU=Users,OU=local,DC=example,DC=com" -D "CN=service-account-username,OU=ServiceAccounts,OU=Users,OU=local,DC=example,DC=com" -W "(uid=dpatel)"

If the above query returns the result, you successfully found the user in the Active directory on specified path. If you can’t find user, please work with Active Directory Admin to find the correct basedn (-b option in above query).

Step-4 :

Add following line in pg_hba.conf and reload the config. Postgres reads pg_hba.conf from top to bottom. Please make sure to add this line at the end so authentication for other application users can be performed without active directory.

host all all 192.168.0.0/16 ldap ldapserver=ldap-server.example.com ldapbasedn="OU=Users,OU=local,DC=example,DC=com" ldapbinddn="CN=service-account-username,OU=ServiceAccounts,OU=Users,OU=local,DC=example,DC=com" ldapbindpasswd="Testing123" ldapsearchattribute="sAMAccountName"

* * Password with special characters like @! doesn’t parse properly in pg_hba.conf. Plus, I had to provide plain text password in pg_hba.conf for AD service account. If you know alternative, please let me know 🙂 

Step-5:
psql> create role dpatel login;
Step-6 :

Test the connection. Looks for Postgres logs for success/error.

Hopefully,  this post will help someone to integrate Postgres with Active Directory using LDAP protocol.

Tracing Tableau to Postgres connectivity issue using Wireshark!

I spent last couple of weeks trying to resolve connection issues from Tableau Server 10.3 to Postgres 9.6.6.  If you are not familiar with Tableau , it  is popular enterprise grade  visualization tool  allow advanced analytic capabilities to understand and visualize  data .  As it is very popular in the industry, it is obvious that tableau has to talk with  a popular database in the industry, which is Postgres!

Issue:    My developers could able to connect to Postgres DB using Tableau Desktop on their laptop without any issue and make LIVE connection to Postgres 9.6 database.  However,  when they publish same dashboard with LIVE connection to database, it was hanging on the tableau server.

To reproduce the issue, we installed and tried to use Tableau Desktop application on Tableau server itself. The Desktop was hanging while connecting to Postgres database. This made it clear that there is some problem on connecting from Tableau server only.  Over the last couple of weeks I have looked various things to understand and resolve the issue:

  1.  Database encoding ; UTF8 Vs ANSI
  2. Tested Postgres ODBC drivers on Tableau Server (Windows 2016)
  3. A lot of troubleshooting because the connection to empty database was working !!
  4. Working with tableau support to explain and track down issue
  5. Windows settings

None of the above helped!

Finally,  Wireshark  came to rescue ! We traced the network traffic on port 5432 on Windows 2016 while it was making the connection to Postgres database running on Linux server.

In the Wireshark logs, when the TCP communicates with the Postgres database, filtering down to the server  indicates that the TCP/IP sequencing is inconsistent. When initiating a connection, sequence numbers are maintained on each side of the connection. In this case, the packets are sent and received from the Postgres Server (PDB), and Tableau Desktop (TD). The sequence number is comprised of both values added together to output a single value to ensure information is accurate.

Typically, TCP data transmission is sequentially ordered. Each packed has an acknowledgement number, known as ACK. The value is equal to the next chronological sequence number from the packet that server has just received.

Like pages in a book, we expect a sequential order. The Wireshark logs indicate the PDB sequence number is not consistent with the TD sequence number as shown below:

1. PDB > TD Sequence=1;Length=14;Next Sequence=15
2. TD > PDB Sequence=1;Length=34;Next Sequence=35;Acknowledge=15
3. PDB > TD Sequence=15;Length=304;Next Sequence=319;Acknowledge=35
4. TD > PDB Sequence=35;Length=174;Next Sequence=209;Acknowledge=319

In the good packets sequence  above, similar to a book of pages, the first line indicates we sequentially expect that if there are 14 pages in the database, the next packet on the next page should from Tableau Desktop should be 15. In the second line, Tableau Desktop returns the response on page 1 and has 34 pages total. The next packet sent will start on page 35 from Postgres. Third, on the database side, PDB indicates to Tableau Desktop, page 15 is the starting page with a total of 304 pages. The next page from Tableau Desktop should start at 319.

Let’s look at problematic lines in below  screenshot for the packets captured  between numbers 7701-7710:

TCP Bad Packet copy

PDB > TD Sequence=1283;Length=32;Next Sequence=1315;Acknowledge=1963
TD > PDB Sequence=1963;Length=40;Next Sequence=2003;Acknowledge=1315
PDB > TD Sequence=1315;Length=22;Next Sequence=1337;Acknowledge=2003
TD > PDB Sequence=2003;Length=343;Next Sequence=2346;Acknowledge=1337
PDB > TD Sequence=9529;Length=1449;Next Sequence=10978;Acknowledge=2346
TD > PDB Sequence=2346;Length=0;;Acknowledge=1337

In the second to last line above, the data has been skipped. PDB expects the next packet should start at byte #1337, but instead the starting point is at byte #9529. Normally, in situations where the bytes don’t match, this would indicate that some of the packets didn’t get captured.

For example, Wireshark missed a few messages. But the ACK value for that packet points to the last packet we captured from TD > PDB, which clearly states it expects PDB to begin its next packet from byte #1337. Not only did our sequence number increment unexpectedly, it increased suspiciously at the value of: 8,192. That’s the exact number of bytes in 8KB (8*2^10=8192).

As a result of the bad packet mismatch, Tableau Desktop machine recognizes the information has been skipped according to the sequence number and re-requests a packet starting at byte #1337, however, PDB does not respond.

To track down packet loss.. you can ping the other server with different packet size…

 PS C:\Windows\system32> ping 192.168.10.100 -l 2000 -f
 
 Pinging 192.168.10.100 with 2000 bytes of data:
 Reply from 192.168.10.100: bytes=2000 time<1ms TTL=64
 Reply from 192.168.10.100: bytes=2000 time<1ms TTL=64
 Reply from 192.168.10.100: bytes=2000 time<1ms TTL=64
 Reply from 192.168.10.100: bytes=2000 time<1ms TTL=64
 
 
 
 PS C:\Windows\system32> ping 192.168.10.100 -l 2100 -f
 
 Pinging 192.168.10.100 with 2100 bytes of data:
 Request timed out.
 Request timed out.
 Request timed out.
 Request timed out.

As you can see in output above, when the package size is larger than 200 bytes, the packets loss is 100%.

In terms on root cause, it turned out to be MTU size mismatch issue. Once the MTU size is adjusted on switch and both  Tableau Windows and Postgres Unix database server, the packet loss issue has been resolved and Tableau managed to connect to Postgres without any issue.

It was a roller coaster experience dealing with mysterious packet loss issue. In case, if you come across similar connectivity issue between application to database, you might want to make sure you are not having packet loss issue. Wireshark can be your friend 🙂

Thanks for reading and happy holidays 🙂

Christmas Gift!

Merry Christmas!!

If you are using SQL Server or Oracle databases, I’m giving away my 1 hour of time for rest of this week for free of cost to  discuss how you can save $$$$ by migrating them to PostgreSQL database by keeping same features and achieve better performance.

you can shoot an email to denish.j.patel@gmail.com

Happy holidays!

Running Postgres in Docker

For last six months, I have been working on moving Postgres from bare metal & VM based  systems in  Docker. As of today, we have migrated a couple of mission critical Postgres DBs  (~ 2TB) on to Docker environment.

During the migration  journey, I have listed down some of the things to consider running Postgres production instances in to Docker environment.

  1.  Do not use default Postgres Docker image. Start with your own docker image from scratch.
  2.  Handle Postgres service shutdown gracefully in docker with SIGTERM
  3.  OS can be stored in Docker container
  4.  Data volume MUST be stored in persistent storage
  5. Use some kind of framework to manage docker containers
    • Apache Mesos & Aurora
    • OpenStack & Kubernetes
  6.  You can mount NetApp for backups and WAL files on to container
  7.  Make templates for resources for different kind of workloads
    • Aurora job for resource templates
    • postgresql.conf templates
  8.  Use static IPs for services ; DBs, Front end an backend servers
    • It will be easier to control access at container level for better security
    • failover is easy to manage with static IP for master DB server
  9.  Benchmark your existing system and compare with new system
    • Keep eyes on TPS using pgbench and benchmarksql
  10.  Monitoring solution for Postgres DB
    • collectd or other agent based monitoring
      • pg_stat_statements is very useful
    • Docker container should be monitored separately
      •  docker stats
  11.  Backup container for taking backups
  12.  Standby container for setting up standby jobs

I hope it will be useful for someone working on migrating Postgres into Docker environment!

YUM repo location & HTTPS changes for Postgres packages !

As Devrim  blogged about  Postgres YUM repo changes , I wanted to write down procedure I have to follow this morning  🙂

For example,

If you want to update YUM repo for Postgres 9.4 on CentOS x86_64, you can update executing below command:

yum localinstall https://download.postgresql.org/pub/repos/yum/9.4/redhat/rhel-6-x86_64/pgdg-centos94-9.4-3.noarch.rpm

Now,  you can follow normal Postgres version upgrade procedure.

Hats off to Devrim to build new packages and all the recent improvements in repo management process! He updated wiki page as well.

Hope this will help someone 🙂

Recovering Postgres database from disk level corruption!!

A couple of weeks ago, I had to deal with corrupted Postgres database cluster. At the end, we couldn’t able to recover some of the data but managed to recover most part of it. Having experience working with dozens of database systems, I’m pleasantly surprised to experience resiliency of Postgres database.

Kudos to Postgres Development team for building the most resilience database in the world 🙂

Here is my Postgres database recovery story 🙂

Disclaimer:

I'm posting the steps carried out during the recovery process for information purpose only. This post doesn't provide any guarantee that it will work for your use-case and/or environment.

Note: The actual database name has been replaced with “dbname”  and actual table names with “tablename”.

One of the Postgres DB cluster database experienced disk level corruption thus we were hitting this error:

postgres=# \c dbname
FATAL: could not read block 0 in file "base/16389/11930": Input/output error

Uh oh?? Really bad, isn’t it? Fortunately, it wasn’t mission critical system so we managed to take extended  outage and work on partial recovery process because we didn’t want to loose all the data!!

Once we received the complaint, we immediately backed up corrupted  database and created recovery cluster to bring up on different  server so we can go through recovery efforts!!

Trial 1:

As many of you know, the first option is to bring up recovery database cluster with  zero_damaged_pages=on   . You can set the value in Postgres config file and try to  reindex system catalog:

reindexdb -p 5433 --system dbname
reindexdb: could not connect to database dbname: FATAL: index "pg_index_indexrelid_index" contains unexpected zero page at block 0 
HINT:  Please REINDEX it.

Doh! Still, we could still not be able to connect to database !!

Trial 2:

If you aren’t aware, you should note down that there is a way to ignore indexes at system level. We started up recovery cluster with ignore_system_indexes=true setting:

pg_ctl -D /data -o '-c ignore_system_indexes=true'
restarted
dbname=# \c dbname

Yay! I could able to connect to DB now!

Trial 3:

Let’s try to reindex the database…

dbname=# reindex database "dbname";
NOTICE: table "pg_catalog.pg_class" was reindexed
2016-08-22 15:53:14.179 PDT rhost=[local] app=psql:user=postgres:db=dbname:ERROR: could not create unique index "pg_statistic_relid_att_inh_index"
2016-08-22 15:53:14.179 PDT rhost=[local] app=psql:user=postgres:db=dbname:DETAIL: Key (starelid, staattnum, stainherit)=(2608, 5, f) is duplicated.
2016-08-22 15:53:14.179 PDT rhost=[local] app=psql:user=postgres:db=dbname:STATEMENT: reindex database "dbname";
ERROR: could not create unique index "pg_statistic_relid_att_inh_index"
DETAIL: Key (starelid, staattnum, stainherit)=(2608, 5, f) is duplicated.

As the table is corrupted with duplicate entries, let’s find out and fix them.

dbname=# select starelid, staattnum, stainherit from pg_catalog.pg_statistic where starelid=2608 order by 2;
starelid | staattnum | stainherit
----------+-----------+------------
2608 | 1 | f
2608 | 2 | f
2608 | 3 | f
2608 | 4 | f
2608 | 5 | f
2608 | 5 | f
2608 | 6 | f
2608 | 7 | f
(8 rows)

Let’s remove one of the entry based on XMIN :

dbname=# delete from pg_catalog.pg_statistic where starelid=2608 and staattnum=5 and xmin=1228447;
DELETE 1
Trial 4:

Restart REINDEX but it failed again!!

2016-08-22 16:01:29.698 PDT rhost=[local] app=psql:user=postgres:db=dbname:ERROR: 1 constraint record(s) missing for rel tablename
2016-08-22 16:01:29.698 PDT rhost=[local] app=psql:user=postgres:db=dbname:STATEMENT: reindex database "dbname";
ERROR: 1 constraint record(s) missing for rel tablename
Trial 5:

Let’s try to vacuum analzye the table

dbname=# vacuum analyze tablename;
2016-08-22 16:04:01.282 PDT rhost=[local] app=psql:user=postgres:db=dbname: 1 constraint record(s) missing for rel tablename
2016-08-22 16:04:01.282 PDT rhost=[local] app=psql:user=postgres:db=dbname:STATEMENT: vacuum analyze tablename;
ERROR: 1 constraint record(s) missing for rel tablename

hrm…it’s still complaining about constraint

Trial 6:

let’s disable constraint check….

dbname=# update pg_class set relchecks=0 where relname='tablename';
UPDATE 1
The above update fixed the  the constraint error
Trial 7:

Let’s reindex the database again!

dbname =# reindex database "dbname";

Yay, Reindex is successful.

Once the reindex is successfully completed, we restarted recovery cluster without zero_damaged_page and ignore_system_indices settings.

Partial tables recovery through pg_dump  process:

As the database is corrupted, it makes sense to kick off the pg_dump on the database … we kicked off the pg_dump but it was still showing some of the sequences with errors!!

/usr/lib/postgresql/9.4/bin/pg_dump dbname -p 5433 -Fc >recovery_dbname.dmp
2016-08-22 16:22:09.517 PDT rhost=[local] app=pg_dump:user=postgres:db=dbname:ERROR: invalid page in block 0 of relation base/16389/2825248
2016-08-22 16:22:09.517 PDT rhost=[local] app=pg_dump:user=postgres:db=dbname:STATEMENT: SELECT sequence_name, start_value, increment_by, CASE WHEN increment_by > 0 AND max_value = 9223372036854775807 THEN NULL WHEN increment_by < 0 AND max_value = -1 THEN NULL ELSE max_value END AS max_value, CASE WHEN increment_by > 0 AND min_value = 1 THEN NULL WHEN increment_by < 0 AND min_value = -9223372036854775807 THEN NULL ELSE min_value END AS min_value, cache_value, is_cycled FROM XX_id_seq
pg_dump: [archiver (db)] query failed: ERROR: invalid page in block 0 of relation base/16389/2825248
pg_dump: [archiver (db)] query was: SELECT sequence_name, start_value, increment_by, CASE WHEN increment_by > 0 AND max_value = 9223372036854775807 THEN NULL WHEN increment_by < 0 AND max_value = -1 THEN NULL ELSE max_value END AS max_value, CASE WHEN increment_by > 0 AND min_value = 1 THEN NULL WHEN increment_by < 0 AND min_value = -9223372036854775807 THEN NULL ELSE min_value END AS min_value, cache_value, is_cycled FROM XX_id_seq

We had issue recovering a couple of tables but we managed to recover the most of the tables in the database  !

This was our Postgres database  recovery story.

Hopefully, it will help someone in case they fall into corruption situation. Any feedback or other ideas are welcome 🙂