Summaries

David Aldridge, aka the OracleSponge has recently blogged on the three ‘P’s of Data Warehousing. David commented that the “pseudo P”, summarisation, was perhaps the most important. Well, I must say that I agree with him there.

In whatever form, be they conventional tables or materialized views, summaries can dramatically boost performance for user queries. True they come at the cost of additional storage and the time it takes to refresh the summary as part of the batch, but they do do two things: they move a potentially expensive (in terms of IO resource and computation) query out of the on-line day, and they allow it to happen just once.

The choice of summaries is sometimes a bit of an art, especially with data warehouses with multiple dimensions; sales by item is relatively easy to aggregate, but what when you also take in to account store, customer, and supplier then which combinations of aggregates are needed? One of our data warehouses has 9 dimensions on the sales table, and many of those have more than 3 levels in the dimension hierarchy. Obviously, some combinations are never queried (or at least not by sane analysts) but other combinations will be very frequently used.

One significant advantage of Oracle Materialized Views and, in 10g, rewrite equivalence is that we do not need to map these summary tables into the user query tool; query rewrite will take a query against the base fact table and look for the best-fit aggregation that either fully or partially answers the query. This ability to use tables at an intermediate level of aggregation can help us reduce the number of aggregates we need to create.

Next post I will talk about GROUP BY ROLLUP and sub-partitioning.