Rollup the sub partitions

In data warehouse there is often the need to summarise, aggregate, or roll-up (the three terms often being used interchangeably) fact data. This is either achieved at query time within the query or in advance by the use of pre-built summaries. As I blogged yesterday, pre-built summaries are often useful to provide answers to frequently asked queries; they remove the need to repeatedly make the same summation. Historically, such pre-built aggregates have been discrete tables, that is, one table for each aggregation. But it is also possible to create a single table to hold the results of multiple aggregations. This can be achieved by the use of the extensions to GROUP BY and in particular GROUP BY ROLLUP and, perhaps, GROUPING SETS.

But what if I use a single GROUP BY ROLLUP summary, wouldn’t it be at least as large as the sum of individual summaries it replaces? And wouldn’t I then need to table scan a far larger table than my original summary? I hear some of you ask. And if you roll-up multiple dimensions into a single table isn’t some of the data repeated; after all, product by region and product by store data should not be added together to get total product. But Oracle provides a function to resolve the issue of large table scans and knowing which rows can be legitimately summed to make a total; the GROUPING_ID function.

If our fact table is already partitioned by time (a range partition) we could create list sub-partitions based on grouping_id. This means that for a query that has both the date and the grouping_id as predicates partition elimination (or pruning) will reduce our table scan to just the set of rows that resolve both predicates.

We could also wrap the group by rollup table in series of views (one for each grouping_id) and then use 10g rewrite equivalence and query rewrite to select the correct view for a query seemingly against the base level fact table.