Big, bad disk.

Over on Doug Burn’s blog there is a link to an interesting piece on large disks. Some people would think that the data warehousing community would welcome large disks. But probably for the majority (those of us that use conventional relational databases) this is not the case. An exception may be for those people that use data warehouse appliances; here data is hashed across all the available disks in the system and predicate processing is pushed out to the on-disk processors, that is, the system processor only sees the data after predicate and column filtering. Very few of these data warehouse appliances exist in the wild; I would guess low hundreds worldwide.

Disks drives can only read from one location of the disk at a time, true, this data read rate may be high and the time to jump between locations may be low, but this is still a slow process compared to CPU and memory operations. If all of a system’s data sits on one disk then every disk read and write will go through a single point and consequently IO throughput will suffer. This would be particularly noticeable in a data warehouse where table scans are common events.