Faster & Better VACUUM FULL

                In presentation, I discussed in detail about Bloat issue in PostgreSQL and methods to remove Bloat from the tables/indexes. Now a days, PostgreSQL9.0 is common and  the widely used version for the production use and it’s vital to remind about changes in most important bloat removal tool called “VACUUM FULL”. Until PostgreSQL 9.0, VACUUM FULL was tardy and DBA always stayed away from it and used CLUSTER instead. (Checkout presentation for difference between CLUSTER vs VACUUM FULL)
                The VACUUM FULL statement recovers free space from a table to reduce its size from bloated tables, mostly when VACUUM itself hasn’t been run frequently enough. Before PostgreSQL 9.0 , it was tardy and slow because of the way it was executed: records were read and moved one by one from their source block to a block closer to the beginning of the table. Once the end of the table was emptied, this empty part was removed. This method was very inefficient: moving records one by one creates a lot of random IO.  Additionally, during this reorganization, indexes had to be maintained, making everything even more costly, and fragmenting indexes. It was therefore advised to reindex a table just after a VACUUM FULL.
                Now, the VACUUM FULL statement, as of PostgreSQL 9.0, creates a new table from the current one, copying all the records sequentially. Once all records are copied, index are created back, and the old table is destroyed and replaced. This has the advantage of being much faster. VACUUM FULL still needs an EXCLUSIVE LOCK  during entire operation. The only drawback of this method compared to the old one, is that VACUUM FULL can use as much as two times the size of the table and indexes on disk, as it is creating a new versions of it.

Let’s compare run-time of VACUUM FULL on PostgreSQL 8.4 vs PostgreSQL 9.0

postgres=# create table vacuumtest(id int primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index “vacuumtest_pkey” for table “vacuumtest”
CREATE TABLE
postgres=# insert into vacuumtest select generate_series(1,10000000);
INSERT 0 10000000
postgres=# delete from vacuumtest where id%4=0;
DELETE 2500000
postgres=# vacuum vacuumtest;
VACUUM

On 8.4:
postgres=# vacuum full vacuumtest ;
VACUUM
Time: 61418.197 ms
postgres=# reindex table vacuumtest;
REINDEX
Time: 12212.815 ms

On 9.0:
postgres=# vacuum full vacuumtest ;
VACUUM
Time: 32640.714 ms

                Above results show that VACCUM FULL on PostgreSQL 9.0 is way faster than previous versions. Moreover, VACUUM FULL has couple of advantages over CLUSTER :  it’s faster than CLUSTER because it doesn’t have to build new table using ORDER by clause & you can run VACUUM FULL on tables on which there isn’t any index.
                If you are running any bloat removal tool on the production database, i would recommend you to revisit vacuum parameters and tighten them up a little that makes regular vacuum to run more frequent so it will help to reduce frequency of running more intrusive bloat removal tools!!