Data Warehousing

Re-summarise versus fast Mview refresh - part 1

Peter Scott

Jul 31, 2006 • 1 min read

A couple of days ago I wrote that we may need re-build summary tables when items change parents. More exactly, it is the tables at the parent level (and above) that need to be refreshed when the parents adopt new offspring

The items that have changed parentage could well be dotted throughout the whole table, and not like the fact load process where we are looking at just a limited rage of dates (or perhaps partitions) This means that we either need to rebuild the whole table - truncate and re-insert or we need to update just the records that have change. If the summary happened to be a materialized view we have the choice of a compete refresh (truncate or delete then insert) or a fast refresh which is basically an update. In some cases the fast option is really the slow option - oh how I hate that use of "fast", it just does not make sense.

One nice thing about materialized views is that they can know when their content no longer matches reality, that is when they become stale - it is a business / DW designer choice whether we ignore or use a stale view. We can also choose what triggers an update to a materialized view. Although it is possible to refresh a view on commit I am not keen on that approach for these reasons

The batch load of reference data is often not completed within a single commit - if we commit every 1000 rows we probably don't want to refresh the view part way through the load
Most summary tables have more than one dimension - why rebuild a view after changes to product parents if the next part of your schedule is to update geography or customer.
We have no predictive metric on how long the rebuild is to take; suppose we made some fundamental change that affected 95% of a summary we would not want that to come a complete surprise to users as they try to use the system first thing on a Monday morning.

Surely the best way to refresh a mview is to make the decision yourself or at least through a more sophisticated set of rules than 'has there been a commit?' It might even be possible to estimate whether complete refresh is faster then fast and choose the best form of refresh for the amount of work to be done.

Sign up for more like this.