Compressed Composites (Oracle 10g Compression) Explained
If you've an interest in the OLAP Option and you've read some of my recent articles on the new features in Oracle 10g OLAP, you've probably seen a feature called "compression" mentioned. In this DBAZine article I described compression as "a novel form of cube compression, which promises to both enhance query performance (by retrieving fewer blocks of data for a given logical amount of data) and drastically reduce batch loading and aggregation times, saving disk space on the way. Compression in Oracle 10g OLAP is more about improving performance and scalability than saving disk space (although that s a nice side effect). The net result of this is that, in a given batch window, Oracle 10g OLAP can now load and aggregate more data than before, and for a given amount of disk, can store more information than before. This, plus big advances in scalability internally around areas such as very big composites, makes Oracle 10g OLAP potentially a very effective platform when building particularly large cubes." So how does this compression feature actually work?
Good question. I actually found out about the compression feature from talking to a couple of people in the OLAP Product team, and at the time this was news to me as I hadn't seen this feature too well signposted in either the recent (2003) batch of Open World papers or in the online documentation. Indeed I was under the impression that it was only coming with the 10.1.0.3 patch release, but I've now located this feature in the online documentation and it's all a bit clearer now.
It appears that what we're talking about when referring to "compression" in 10g OLAP cubes is actually something called "Compressed Composites". Composites should be fairly familiar to anyone working with Express or Oracle OLAP and are structures that you set up when you've got very sparse cubes. In cases where there are few actual used combinations of dimensions for your variable, compared to the potential set of valid combinations, setting up composites, which only store within them the actual combinations of dimensions used, together with indexes to the underlying dimensions, reduces the amount of NA values stored in the variable and results in more efficient data storage.
So what are compressed composites? Probably the best description is to just quote the OLAP DML Reference R1 (10.1) section on compressed composites:
"In some cases, when you aggregate data in a variable dimensioned by a composite defined with one or more hierarchical dimension, one parent node may have only one descendant node and so on all the way up to the top level. When a variable has a good deal of this type of sparsity, use a compressed composite as the dimension of the variable. Dimensioning this type of variable with a compressed composite creates the smallest possible variable, composite, and composite index much smaller than if you dimension a variable with a b-tree or hash composite.
This reduction in size does not occur at the detail level. Oracle OLAP creates composite values for detail level the same way for all composites. A composite contains one composite tuple for each set of base dimension values that identifies non-NA detail data in the variables that use it.
The reduction in size occurs for those sets of base dimension values that identify non-NA data at higher levels of hierarchical dimensions. Oracle OLAP populates these higher-level values differently depending on whether a variable is dimensioned by a b-tree, hash, or compressed composite:
For variables dimensioned by b-tree and hash composites, Oracle OLAP creates composite tuples for non-NA data at higher levels the same way that it does for non-NA data at the detail level. There is one composite tuple (with its own physical position) for each set of base dimension values that identifies non-NA data. The composite index contains all of the index entries needed to relate the composite tuple to the base dimension values.
For variables dimensioned by compressed composites, Oracle OLAP reduces redundancy in the variable, composite, and composite index by using the"intelligence" of the AGGREGATE command that populates the variable. For sets of base dimension values that represent parent nodes, Oracle OLAP creates a physical position in the composite only for those tuples that represent a parent with more than one descendant. Oracle OLAP then creates an index between this composite structure and the base dimensions and uses this composite structure as the dimension of the variable. Since the actual structure of a compressed composite is smaller than that of a b-tree or hash composite, a variable dimensioned by a compressed composite is also smaller than a variable dimensioned by a b-tree or hash composite. Also, since the index for a compressed composite only has nodes for parents with more than one descendant, the index of a compressed composite has fewer levels and is smaller than the index of a b-tree composite.
Although performance varies depending on the depth of the hierarchies and the order of the dimensions in the composite, aggregating variables defined with compressed composites is typically much faster than aggregating variables defined with b-tree or hash composites."
You can define a composite as compressed using the syntax:
DEFINE name COMPOSITE [AW workspace] COMPRESSED [SESSION]
Alternatively, with the forthcoming 10.1.0.4 patch release and the accompanying new version of Analytic Workspace Manager, you can turn on compression for a cube (which translates to one or more variables) using the GUI.
This presumably tells AWM to create any composites with the COMPRESSED option as detailed above.
One of the things that was mentioned to me when I was first told about compression was that in this initial version, the scope for using it was quite limited. This would appear to be borne out by the reference in the above documentation that states:
"...when you aggregate data in a variable dimensioned by a composite defined with one or more hierarchical dimension, one parent node may have only one descendant node and so on all the way up to the top level. When a variable has a good deal of this type of sparsity, use a compressed composite as the dimension of the variable. Dimensioning this type of variable with a compressed composite creates the smallest possible variable, composite, and composite index much smaller than if you dimension a variable with a b-tree or hash composite"
which would suggest that compression in this release is only of value when you have the specific situation where you're finding parent nodes with only one descendent node. If anyone from the product team is reading, could you add any more detail for this?
Now that this is a bit clearer, the next step is to get hold of a migrated OES database that's going to 10g, identify whether the specific conditions for compressed composites are met, implement them and do some benchmarking. We've got a couple of clients who are looking to implement this feature so I'll report back with results in due course. Once again full details on compressed composites can be found in the OLAP DML Reference available on OTN.