Collecting statistics

For a while we have used DBMS_STATS.GATHER_SCHEMA_STATS with the gather stale option to collect statistics on one of our Oracle 9.2 data warehouses. It was perhaps a little slower than our old approach but it did the job and was also code not written (and hence not maintained) by us; from posts past you know I am a great believer of the use of in-built functionality, especially if means a reduction in lines hand-crafted code.

All was well with our stats collection until the database was patched to the latest release, then although our preferred stats collection method did not break it slowed to such an extent that we could no longer use it during our batch window. Needless to say this is under investigation with Oracle support. But in the meantime we have reverted to our old technique collecting stats on all of the partitions that have been updated during the batch and then use the partition stats to synthesise the global table stats. This is close enough for day to day use (for this system). At weekends we have a bit more time and can rebuild the global stats from scratch.

You may think that in a data warehouse of say two years history that missing a day’s statistics would not make great difference. But if the partitioning scheme is by day and a partition is not analyzed then things start to go badly wrong: queries that hit that partition almost certainly will choose the wrong path, bitmap indexes and star transforms are overlooked, query rewrite will also make the wrong choices and irate users log calls that their 10 minute query is taking 1 hour.