My data warehouse is bigger than yours!
Sadly, it seems ingrained into Western culture that having more of something than someone else is "good". This idea that "my dad's tougher / bigger / richer / cleverer (insert your choice here) than yours" has been the stuff of school yard taunts down the ages. But for a data warehouse is bigness a desirable attribute? Vendors seem keen to quote the size of the biggest systems running on their hardware, operating systems or databases (perhaps it is the kid in them), but for normal folk like us, does size matter that much?
Size brings problems
Not problems that can't be solved, but the sort that needs thought to resolve before making design commitments. Sadly, these problems are not always helped by the marketing hype of vendors such as :- "You can buy big disks now, one disk per terabyte is a good idea!"
- "Modern RAID5 SAN hardware is good for 'read only' databases such as data warehouses; the caching can keep up with the occasional disk writes"
The other thing that a data warehouse does is process the result set returned from the disk, and this often means that the data needs to be sorted, and for large result sets this can not always happen in memory, so what happens? We write the sort to disk. And writing large volumes of data to RAID5 disk is potentially a performance killer, at first things seem quick - the battery backed cache makes things seem quick until we saturate the cache and then things slow the true disk write rate. Personally, I have never seen a SAN with enough cache to cope with a 100GB sort.
(more to follow later...)