Materialized Views - 2

The other day I was asked about the approach I took to refreshing materialized views. To be honest I use my own techniques - others do similar things.

In a data warehouse summary table very little actually changes; there is perhaps a small window were new data is added, but even allowing for late arriving data the vast majority of data is unchanging. Perhaps this is untrue with type 1 SCD summaries where items may migrate between parents, but the SC part of SCD is slowly changing so this may not occur that often.

In the data warehouses I run, we tend to rely heavily on range partitioning on date. This gives improved maintainability, and perhaps more importantly allows us to use partition elimination to boost query performance by reducing the amount of data to be manipulated. It is also the key to the way we maintain mviews. We simply do the following for each partition to be refreshed:

  • Truncate partition
  • Make bitmap indexes unusable
  • Append into now empty partition
  • Rebuild unusable indexes
The query we use to refresh the partition is often not the same as the one used in mview definition. Doing this allows us to present a whole suite of mviews as being defined from a single master fact table, but can take advantage of the other summary tables to minimise the need to constantly repeat the same aggregations (if we aggregate by both state and by region then it a lot quicker to roll-up state into region than having to go back to the store level fact table again). Making the mview suite appear to have single fact source solves two problems with mview nesting:
  • it removes the need to construct denormailsed reference data tables for the higher level aggregations - query rewrite requires a FK relationship between reference and fact
  • it allows all of the summaries to be considered for rewrite - in nesting only the first tier summaries are considered and then the rewritten query tested for rewrite against the next tier, but what if the best first tier query did not lead to best overall? We use the wrong summary
Of course, this is a simplified outline of what goes on - compression can be applied on data load as we are using direct path techniques, parallel may be an option but you then need to think about how much data you are loading as each parallel process will direct path load to its own database blocks and you can quite easily waste space