Audit logging with Postgres partitioning

As I mentioned in my previous post  “Audit logging using JSONB in Postgres” , audit tables can be partitioned easily in Postgres 10.

Let’s use Postgres partitioning in sample users_audit table…

Drop existing trigger on live table and users_audit table

drop trigger users_audit_trig ON public.users;
drop table if exists audit.users_audit;

Create partition table by RANGE partition on audit_ts timestamp column …

set search_path to audit;

create table audit.users_audit(
audit_ts timestamptz not null default now(),
operation varchar(10)not null,
username text not null default "current_user"(),
before jsonb, 
after jsonb
) partition by RANGE (audit_ts);

Create child tables…

CREATE TABLE audit.users_audit_2018_07 PARTITION OF audit.users_audit FOR VALUES FROM ('2018-07-01') TO ('2018-08-01');
CREATE TABLE audit.users_audit_2018_08 PARTITION OF audit.users_audit FOR VALUES FROM ('2018-08-01') TO ('2018-09-01');
CREATE TABLE audit.users_audit_2018_09 PARTITION OF audit.users_audit FOR VALUES FROM ('2018-09-01') TO ('2018-10-01');
CREATE TABLE audit.users_audit_2018_10 PARTITION OF audit.users_audit FOR VALUES FROM ('2018-10-01') TO ('2018-11-01');
CREATE TABLE audit.users_audit_2018_11 PARTITION OF audit.users_audit FOR VALUES FROM ('2018-11-01') TO ('2018-12-01');
CREATE TABLE audit.users_audit_2018_12 PARTITION OF audit.users_audit FOR VALUES FROM ('2018-12-01') TO ('2019-01-01');

Create required index on EACH partitioned table..

create index on audit.users_audit_2018_07 (audit_ts desc,operation);
create index on audit.users_audit_2018_07 using GIN(before);
create index on audit.users_audit_2018_07 using GIN(after);
create index on audit.users_audit_2018_07 using GIN ((after->'userid'));

NOTE: you have to pre-create these child tables as well indices in advance so you can come up with process to create them using some kind of script or add trigger on parent partitioned table to create child table automatically.

Place the trigger back on USERS table..

CREATE TRIGGER users_audit_trig
 BEFORE INSERT OR UPDATE OR DELETE
 ON public.users
 FOR EACH ROW
 EXECUTE PROCEDURE public.users_audit_trig();

Update the record for userid=101

app=# select * from public.users where userid=101;
-[ RECORD 1 ]-+-------------------------
userid | 101
username | resslement2s
first_name | Rudiger
last_name | Esslement
city | Baltimore
state | Maryland
email | resslement2s@auda.org.au
phone | 915-528-7033
is_like_sport | t

app=# update public.users set first_name='Denish', last_name='Patel' where userid=101;
UPDATE 1

Let’s query users_audit table..

app=# select * from audit.users_audit;
-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
audit_ts | 2018-07-09 14:01:39.373666-04
operation | UPDATE
username | denishpatel
before | {"city": "Baltimore", "email": "resslement2s@auda.org.au", "phone": "915-528-7033", "state": "Maryland", "userid": 101, "username": "resslement2s", "last_name": "Esslement", "first_name": "Rudiger", "is_like_sport": true}
after | {"city": "Baltimore", "email": "resslement2s@auda.org.au", "phone": "915-528-7033", "state": "Maryland", "userid": 101, "username": "resslement2s", "last_name": "Patel", "first_name": "Denish", "is_like_sport": true}

As you can see below the record is added to ONLY 2018_07 child table…

app=# select * from audit.users_audit_2018_08;
 (0 rows)
app=# select * from audit.users_audit_2018_07;
-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
audit_ts | 2018-07-09 14:01:39.373666-04
operation | UPDATE
username | denishpatel
before | {"city": "Baltimore", "email": "resslement2s@auda.org.au", "phone": "915-528-7033", "state": "Maryland", "userid": 101, "username": "resslement2s", "last_name": "Esslement", "first_name": "Rudiger", "is_like_sport": true}
after | {"city": "Baltimore", "email": "resslement2s@auda.org.au", "phone": "915-528-7033", "state": "Maryland", "userid": 101, "username": "resslement2s", "last_name": "Patel", "first_name": "Denish", "is_like_sport": true}

Make sure constraint_exclusion=partition in postgresql.conf so query can use partition pruning. Below query grabs data from single partitioned table because WHERE clause includes partitioned_key (audit_ts).

app=# explain analyze select * from audit.users_audit where audit_ts >= '2018-07-01' and audit_ts < '2018-08-01' and after->>'userid'='101';
 QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Append (cost=4.17..9.52 rows=1 width=142) (actual time=0.010..0.011 rows=1 loops=1)
 -> Bitmap Heap Scan on users_audit_2018_07 (cost=4.17..9.52 rows=1 width=142) (actual time=0.010..0.010 rows=1 loops=1)
 Recheck Cond: ((audit_ts >= '2018-07-01 00:00:00-04'::timestamp with time zone) AND (audit_ts < '2018-08-01 00:00:00-04'::timestamp with time zone))
 Filter: ((after ->> 'userid'::text) = '101'::text)
 Heap Blocks: exact=1
 -> Bitmap Index Scan on users_audit_2018_07_audit_ts_operation_idx (cost=0.00..4.17 rows=2 width=0) (actual time=0.004..0.004 rows=1 loops=1)
 Index Cond: ((audit_ts >= '2018-07-01 00:00:00-04'::timestamp with time zone) AND (audit_ts < '2018-08-01 00:00:00-04'::timestamp with time zone))
 Planning time: 0.239 ms
 Execution time: 0.032 ms
(9 rows)

As you can see, it’s very easy to use partitioning in Postgres 10. In the next Postgres 11 release comes with even better features for partitioning.

Stay tuned!

Audit logging using JSONB in Postgres

Recently, someone reached out to me asking “what’s the best way to achieve database DML auditing in Postgres?”

I have suggested a couple of options below to achieve DML auditing in Postgres:

  1. Audit using Postgres logs. Postgres allows to keep track of DML statements at database level by enabling auditing directly in the postgres logs. You can use https://github.com/pgaudit/pgaudit extension to make the audit logging consistent and readable.
  2. Audit tables using trigger based approach by creating audit schema on live database and keep the audit tables updated through trigger.

Both of these approaches have pros and cons.

If you are looking for detailed auditing (including SELECT) at database level, you can use pgaudit extension. However, you will have to deal on how to make auditing data queryable for end users. Additionally, you have to enable at database level instead of specific tables.

On the other hand, if you are only concerned about auditing DML and for specific tables and even further if you want to optimize at column level, trigger based approach is your answer. However, you have to deal with audit schema growth . However, audit tables are readily available to query without any further processing or tools.  If you are using exact schema for audit tables, the trigger based approach requires to change schema on audit tables when you change live schema so there will be additional overhead managing audit schema. However, if you use Postgres JSONB column to keep track of auditing, you can come over overhead of  schema changes on audit tables with live tables.

In this post, I have explored a way to use JSONB data type to store auditing details using trigger based approach.

Let’s create sample USERS table:

create schema if not exists public;

create table public.users(
userid serial primary key,
username text not null,
first_name text not null,
last_name text not null,
city varchar(30) not null,
state varchar(30) not null,
email text not null,
phone varchar(30),
is_like_sport boolean default false
);

Create audit schema and table to keep track of changes in USERS_AUDIT table.  As you can see below, the before and after columns are JSONB.

create schema if not exists audit;

create table audit.users_audit(
audit_ts timestamptz not null default now(),
operation varchar(10)not null,
username text not null default "current_user"(),
before jsonb, 
after jsonb
);

Create a trigger function and trigger on USERS table to keep track of audit changes

CREATE OR REPLACE FUNCTION public.users_audit_trig()
 RETURNS trigger
 LANGUAGE plpgsql
AS $function$
begin

IF TG_OP = 'INSERT'
THEN
INSERT INTO audit.users_audit (operation, after)
VALUES (TG_OP, to_jsonb(NEW));
RETURN NEW;

ELSIF TG_OP = 'UPDATE'
THEN
IF NEW != OLD THEN
 INSERT INTO audit.users_audit (operation, before, after)
VALUES (TG_OP, to_jsonb(OLD), to_jsonb(NEW));
END IF;
 RETURN NEW;

ELSIF TG_OP = 'DELETE'
THEN
INSERT INTO audit.users_audit (operation, before)
VALUES (TG_OP, to_jsonb(OLD));
RETURN OLD;
END IF;
end;
$function$ ;

CREATE TRIGGER users_audit_trig
 BEFORE INSERT OR UPDATE OR DELETE
 ON public.users
 FOR EACH ROW
 EXECUTE PROCEDURE public.users_audit_trig();

I have used https://mockaroo.com/ for generating mock data for users table.

mockroo users data

Copy mock data into users table

app=# copy users from '/usr/local/var/postgres/mockdata/MOCK_DATA.csv' with header CSV;
COPY 1000

To see how Mockroo is generating mock data, let’s query userid=101

app=# select * from users where userid=101;
-[ RECORD 1 ]-+-------------------------
userid | 101
username | resslement2s
first_name | Rudiger
last_name | Esslement
city | El Paso
state | Texas
email | resslement2s@auda.org.au
phone | 915-528-7033
is_like_sport | t

The audit table for userid=101 row looks like …

app=# select * from audit.users_audit where after->>'userid'='101';
-[ RECORD 1 ]-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
audit_ts | 2018-07-05 14:39:06.960812-04
operation | INSERT
username | denishpatel
before |
after | {"city": "El Paso", "email": "resslement2s@auda.org.au", "phone": "915-528-7033", "state": "Texas", "userid": 101, "username": "resslement2s", "last_name": "Esslement", "first_name": "Rudiger", "is_like_sport": true}

INSERT statement audit is straight forward to query.

Let’s update city and state column where userid=101

app=# update users set city='Baltimore',state='Maryland' where userid=101;
UPDATE 1

You want to see only changed columns from before and after columns. Let’s create utility function  jsonb_diff function to show difference between two JSONB values..

CREATE OR REPLACE FUNCTION audit.jsonb_diff(l JSONB, r JSONB) RETURNS JSONB AS
$json_diff$
 SELECT jsonb_object_agg(a.key, a.value) FROM
 ( SELECT key, value FROM jsonb_each(l) ) a LEFT OUTER JOIN
 ( SELECT key, value FROM jsonb_each(r) ) b ON a.key = b.key
 WHERE a.value != b.value OR b.key IS NULL;
$json_diff$
 LANGUAGE sql;

Now, query audit table to see UPDATED values..

app=# select after->>'userid' as userid , audit.jsonb_diff(before,after) as before_change , audit.jsonb_diff(after,before) as after_change from audit.users_audit where operation='UPDATE';
-[ RECORD 1 ]-+--------------------------------------------
userid | 101
before_change | {"city": "El Paso", "state": "Texas"}
after_change | {"city": "Baltimore", "state": "Maryland"}

Postgres allows to create indexing on entire JSONB columns as well to specific key in the JSONB.

let’s create some useful indices..

app=# create index idx_users_audit_audit_ts_operation on audit.users_audit (audit_ts desc,operation);
CREATE INDEX
app=# create index idx_gin_users_audit_before on audit.users_audit using GIN(before);
CREATE INDEX
-- create index on entire after JSONB object
app=# create index idx_gin_users_audit_after on audit.users_audit using GIN(after);
CREATE INDEX
-- create index on userid key from after object
app=# create index idx_gin_users_audit_after_userid on audit.users_audit using GIN ((after->'userid'));
CREATE INDEX

The audit tables grows really fast so you want to partition audit tables. Postgres 10 makes table partitioning much easier to manage so you should convert audit tables into partition tables to keep only X months of data in production database and archive rest of partitioned table off to cheaper storage i.e Amazon S3 or Glacier.  The upcoming Postgres 11 release includes with even more partitioning performance features so they will be helpful to keep up DB performance with users and database growth.

Hope it will be helpful to someone architecting audit schema in Postgres.

Any suggestions/ comments welcome.

Connecting Postgres to Active Directory for Authentication

I have connected our Postgres instance to Active Directory for authentication because I didn’t want to manage passwords for hundreds for users accessing mission critical databases. The authentication is being performed by Active Directory but authorization (managing groups, permissions at table/column/row level) is still be handled by Postgres. The biggest advantage of connecting Postgres to AD for real users, I don’t have to worry on password and password policies because the policies are handled by AD. On the other hand,  users database authentication will be dependent on AD infrastructure. Please make sure AD infrastructure is redundant enough with low latency connection  for your organization before making this decision.

I have come up with simple steps to connect Postgres to Active Directory. TBH, If you can get serviceAccount to be used to connect to organization’s active directory, the rest of the setup is fairly straight forward.

Step-1 :

Get and active directory server name and test connection from Postgres server :

postgres-dbserver$ nc -v ldap-server.example.com 389
Connection to ldap-server.example.com 389 port [tcp/ldap] succeeded!
Step-2 :

Install ldap-utils package so you can use ldap command line tools to test connections.

yum install ldap-utils
or
apt-get install ldap-utils
Step-3 :

Ask your AD/LDAP admins to get full path of OU and CN for your organization and construct ldapsearch query. I’m using sAMAccountName as username but you can use other attributes for username i.e uid

ldapsearch -v -x -H ldap://ldap-server.example.com -b "OU=Users,OU=local,DC=example,DC=com" -D "CN=service-account-username,OU=ServiceAccounts,OU=Users,OU=local,DC=example,DC=com" -W "(sAMAccountName=dpatel)"

If you are using uid for the validation,  your search query will look like this …

ldapsearch -v -x -H ldap://ldap-server.example.com -b "OU=Users,OU=local,DC=example,DC=com" -D "CN=service-account-username,OU=ServiceAccounts,OU=Users,OU=local,DC=example,DC=com" -W "(uid=dpatel)"

If the above query returns the result, you successfully found the user in the Active directory on specified path. If you can’t find user, please work with Active Directory Admin to find the correct basedn (-b option in above query).

Step-4 :

Add following line in pg_hba.conf and reload the config. Postgres reads pg_hba.conf from top to bottom. Please make sure to add this line at the end so authentication for other application users can be performed without active directory.

host all all 192.168.0.0/16 ldap ldapserver=ldap-server.example.com ldapbasedn="OU=Users,OU=local,DC=example,DC=com" ldapbinddn="CN=service-account-username,OU=ServiceAccounts,OU=Users,OU=local,DC=example,DC=com" ldapbindpasswd="Testing123" ldapsearchattribute="sAMAccountName"

* * Password with special characters like @! doesn’t parse properly in pg_hba.conf. Plus, I had to provide plain text password in pg_hba.conf for AD service account. If you know alternative, please let me know 🙂 

Step-5:
psql> create role dpatel login;
Step-6 :

Test the connection. Looks for Postgres logs for success/error.

Hopefully,  this post will help someone to integrate Postgres with Active Directory using LDAP protocol.