More on moving data
One of my pet hates is moving shed-loads of data from the disk to aggregate it away to resolve a user's query. Experience tells me that the very thought of doing this imposes a massive strain on a data warehouse. Ok, millions of people tell me that I exaggerate, but I don't think I am too far from the truth on this one.
Fortunately database vendors have a few tricks in their armouries that minimise the amount of data being sought, and that of course is a good thing if the moving all of those zeros and ones around is the bottleneck that I think it to be.
Summary tables are perhaps the oldest solution to this: aggregate just once and then let users access the pre-canned results. Modern variants on this include using query rewrite with materialized views; this allows queries to be written against the base fact tables and the optimiser cleverly (or sometimes, not so cleverly) picking a pre-created summary table to supply the result.
But what when only the base fact table can provide the results needed? Two main approaches stand out: read as much as possible in one go, or read no more than is needed. In fact combing both strands looks a good approach.
Use larger block sizes Despite my aversion to reading data, Data Warehouses are designed to read masses of data, so the bigger the block that is read in one go the less visits to disk needed, the faster we go (gross simplification alert!). And then given the 'write-once, read many nature of a DW, the ability to compress more rows into a block further advantage. Oracle can really squeeze the data in, choose a good order by key and bulk insert and 5TB could be reduced to just one.
The other key to reducing the volume of data to work against is the use of partitioning to divide a table into smaller chunks. Although some people think of partitioning as tool to improve the life of a DBA (the divide and conquer school of space management) there is a lot to be said for letting the optimiser pick the sections of the table that are actually relevant
There are of course a lot of traps for the unaware with these features - perhaps I'll comment more next time.