Data Warehousing

More on summaries

Peter Scott

Aug 9, 2006 • 2 min read

One of my customers runs a chain of supermarkets. The bottom three levels of their product hierarchy are:

Saleable unit: Product as sold - could be price marked, promotion pack, "buy on get one for free" (BOGOFF) or whatever was provided by their suppliers. Each has it own bar code for the POS system
Article: product and size - is it pint, quart, 100g, 200g or whatever
Product: what it is - 'DAZ' washing powder, own-brand tea, Quilly's Throat Lozenges are examples

The level of reduction between Saleable unit and product is often small - if own-label team only comes in one pack size and is never sold in promotional packs then there is one record for each layer; often there is a one-to-one between product and article and no more that five to one between saleable unit and article (and many time just a one to one). Consequentially, a summary at product level is not a lot smaller than a similar one at saleable unit level. But lots of business uses want to know how well a particular brand of coffee is doing and not really care if has "20% off" on the jar or not, others want to know how small packs compare to large so we end up with many queries being run against each of these three levels of the hierarchy.

Conventional wisdom says if users run lots of similar queries then we should look at providing an aggregate to move the pain of adding up all of those rows to a single batch event. But conventional tight-fistedness is not keen on building an aggregate where the disk real estate need is about the same as its source table.

But with Oracle 9 and beyond there are other approaches:

the bitmap join index, here we create a bitmap index on the fact table or the low level summary but use an attribute on the dimension table as the index key in other words we prejoin the index. The syntax is remarkably simple

CREATE BITMAP INDEX pete_s_1 on SUMMARY_1(product.prod_id)
FROM
SUMMARY_1, PRODUCT
WHERE
SUMMARY_1.SU_CODE=PRODUCT.SU_CODE

You may want to pad that out a bit to put the index in the right place and for partitioned summaries use local index partitions. There is a potential downside if the indexed column is volatile, that is subject to frequent change - data in summaries is relatively static; keys do not change value, that may not be the case for an attribute to a dimension, but in our case parentage at such a low level does not change much so we can get away with it!

The other choice is to go with STAR_TRANSFORMATIONS it this case Oracle requires bitmap index on the foreign keys of the dimension table - that is the hierarchy levels. Oracle will than use the bitmap index on the dimension to compute just the members to select for the aggregate - A good write-up is to be found here

Sign up for more like this.