Analyzing effect on default_statistics_target on ANALYZE

Last week, I received request to increase PostgreSQL’s config parameter default_statistics_target on one of the very large DB clusters.  The developers filled a ticket to increase the default_statistics_target to 10,000 at cluster level because one of the queries has started  full sequencial scan  instead of index scan.

By default, Postges cluster sets default_statistics_target to 100 but you can set up to 10,000. This parameter defines how many values are to be stored in the list of most common values, and also indicates the number of rows to be inspected by ANALYZE process.

Out of curiosity, I dig into Postgres ANALYZE code to figure out how many rows is being scanned by ANALYZE based on default_statistics_target  value. As you can see in line 1763, ANALYZE inspects rows up to multiple of 300 & default_statistics_target.

1763  stats->minrows = 300 * attr->attstattarget;

If the default_statistics_target value is 100, ANALYZE could scan up to 300 * 100= 30,000 rows to gather accurate statistics.

As per Postgres documentation on ANALYZE,  if you increase the value blindly at cluster level it could hurt the smaller queries because query planning time could increase significantly by setting the default_statistics_target value to maximum, which can eventually degrade the peformance for simple queries.  The next logical step for me to find out optimal value for the default_statistics_target for the specific table with slow query. I will probably start with increasing value to 1000 and increase/decrease value based on EXPLAIN ANALYZE query results.

When you try to tune default_statistics _target for your database server,  hopefully this finding will help. Let me know, if you have alternate views or idea for tuning the setting. Thanks!

3 thoughts on “Analyzing effect on default_statistics_target on ANALYZE”

  1. The Postgres community has a fear of planning time that sometimes approaches paranoia.

    You can see how Postgres stores statistics in their raw form by looking at the pg_statistic (NOT pg_stats) table. The bulk of the storage for any sizable table will be in one or more of the stanumbers and stavalues array columns. Generally, those arrays will be elements long. I’m guessing that searches in those arrays are done with a binary search. So yes, a very large statistics target will increase planning time, but we’re talking microseconds here. In almost any normal usage a statsistics target of 1000 or ever 1000 should be fine.

Comments are closed.