DW Design (4) - summary tables
I suppose that if we had unlimited computing power and the ability to read our entire fact table in a single, instant, read then there would be no need to summarise. But in the real world this is not the case. Summaries are used to speed queries by pre-calculating aggregations (often, but not invariably, summations); however this has two costs, the space used to store these aggregate table and the time taken to maintain it. Choose badly and a summary will not be accessed by users and we waste both space and batch time. But choosing well may be a bit of an art form - some combinations of dimensions of course, will be obvious, but other combinations are less than intuitive. The one thing that goes without saying is that it is impractical to cover all possible combinations; it's just as well that we don't have to. The first thing to realise is that there is nothing wrong with summarising a summary (unless we are rolling up time and the aggregate is non-additive, such as bank balance); the total sales value for all geographies/all products/all channels is an important figure for the business but is it worth storing it in its own table when there are only four regions, three channels and ten product groups that need to be added up? The second thing to note is that there is probably little point having an summary table that is not appreciably smaller than its parent unless there is strong business driver that says this summary resolves a vast number of user queries; but if there is little to differentiate a summary and its parent I doubt that there would much of a difference between the summary and parent when rolled up to a higher degree of aggregation.
So, if you have all of those summary tables how do you use them? In the old, pre-8i days you either had to know which summaries existed and choose the best fit for your query or programmed it into your query tool in some form of aggregate awareness; I used to do this with Oracle Sales Analyzer. It was a somewhat tedious task to map in, say 70 tables. Materialized views and query re-write made an amazing improvement to the process, now we only need to map in one or two views (the base level ones) and Oracle can choose the best fit
The next piece will talk about some quirks of materialized views and query rewrite, a little about summary storage and a bit more about the GROUP BY extensions and then finally the need (or not) to re-summarise when business data changes