Oracle OLAP 10gR2 Incremental Load Improvements : The Mystery Solved…

Mark Rittman

Oct 12, 2006 • 2 min read

Right, I've been meaning to post this to the blog for a couple of weeks now but I keep getting sidetracked. If you followed the debate around my Incremental Load Improvements in Oracle OLAP 10gR2 posting a couple of months ago you'll be interested in a follow-up Scott Powell did on the OTN OLAP Forum.

Basically neither of us could work out whether the improvements were real or just some over-enthusiastic marketing on Oracle's part, but in the end an ex-Oracle employee called Dan Peltier put us out of our misery and posted some details of how these improvements actually work, and how you can use new features such as partitioning and global composites to improve the performance of a cube aggregation. I won't repeat the bulk of Dan's posting here, but there's a particularly relevant bit at the end that explains how the improvements were actually implemented (the quoted text at the start is a question from me):

"> Interesting point by Dan (Vlamis?) about compressed composites having a special ability to detect changed > data, and deal with incremental loads faster than non-compressed composites. Certainly compression > itself isn't the answer, as Oracle's slides suggest that incremental loads are faster in 10gR2 compared > to 10gR1, and compression was in both, but maybe there's an improvement in compression in R2 that > gives the effect that Dan mentions when specifically dealing with incremental loads. I'll have to check it > out.
There is in fact exactly such an improvement.

If you look at the OLAP DML level, in 10.1 you weren't even allowed to load data into a compressed variable unless you cleared all the aggregates first. Thus, every time you wanted to make a change, you had to clear all the aggregates, change your detail data, and rebuild from scratch.

In 10.2 you're allowed to make changes to an aggregated, compressed variable. If you load some new data and then run the AGGREGATE command, the engine will actually try to do some tricks to aggregate only the parts of the cube that were affected by the changes you made since the last time you aggregated. It's not incredibly good at this, but it can handle some simple cases. If the engine decides that your changes are too complicated, then it just nukes all the aggregates and rebuilds from detail.

Obviously, the slide you saw was for one of the simple cases.

The most trivial case in which the engine can avoid doing a full rebuild is if you haven't made any changes at all to the variable or to the hierarchies. Also, I believe that it can avoid doing a full rebuild if you have added a small number of dimension values to one dimension but have not changed the parent of any existing dimension value in any of the cube's hierarchies - in other words, if you've added a week's worth of data.

AWM does aggregation on a partition-by-partition basis, so there the question actually becomes, have you changed this partition in such a way as to necessitate a full rebuild? The answer may be yes for some partitions and no for others.

Note that none of this applies to regular, non-compressed cubes. However, with uncompressed composites, you can, at the OLAP DML level, keep track manually of which areas of the variable need to be recomputed, and then use that information to set dimension status before running AGGREGATE. Only the areas that are in status get rebuilt. This technique is not allowed for a compressed variable, because the compression technique used generally makes it impossible to recalculate some arbitrary subset of the data."

Sign up for more like this.