DW Appliances
David Aldridge is writing on Disk IO schedulers and table read performance under Linux. In passing he mentioned DW Appliances, which in itself is an interesting topic, to avoid moving the dialogue on his blog well off-topic (or at least his topic) I thought I'd write a bit here.
The latest Gartner magic quadrant report on DW has the appliance vendors (and now in the plural) moving on up (if that is a valid way to view the report). Netezza has perhaps the longest history in the field and the most implementations but the other vendors in the physical appliance space have also started to ship product. There are a also a few vendors in the virtual space - software to run on your own hardware.
For now I will talk about the one-stop box, the unwrap it and plug it in device. In a way this is similar ground to that occupied by Teradata from NCR, and perhaps the IBM appliance offer, but with a far smaller price tag. In general the vendors take open source databases and operating systems such as Linux, add in some propriety know-how around software and / or hardware and ship something that is relatively quick and easy to set up.
Each vendor has their own take on getting performance but the main point is to reduce the amount of data being fetched from a single disk to the minimum. Netezza does this by hash partitioning data over many physical disks (56 or 28 disks = 3 TB). Each disk has its own processor connected to the main Linux processor by a gigabit channel. Queries are fired against the disk processors and only the selected columns from the rows that match the query predicate are returned to the central CPU. This minimises the traffic on the main IO channels but not at the disk level. The Netezza design does not use indexes, data is found be reading it all in. This brute force technique is very effective, and highly cost effective - the processors used on the disk array need not be expensive, I believe they are even considering game console CPUs as they are low cost and very low power consumption.
David made a very good point about performance:
But with a combination of detailed partitioning and read rates sustainable near the theoretical maximum for the devices Oracle ought to be able to give the appliances a pretty good run for their money.In my opinion, DW appliances have a generally good performance even if the data is just slapped in with no real though to design, but they use brute force to fetch the answers. Oracle and most other RDBMS based DW systems need good design to perform well, this means storing data in such a way that the bare minimum needs to be read to provide the answer.