Data Warehouse speed

I see that David Aldridge is buying a new laptop, now that is a dilemma! Do you go for the biggest, fastest CPU, the most RAM, the best display or choose something more modest?. Of course it depends what you are going to do, I just need a machine that is plain vanilla enough (no fancy hardware with its incompatibility problems) to run virtual machines successfully for various databases and query tools, and to have the connectivity tools to allow wireless and Bluetooth access on the move and the ability to connect to a highspeed network to allow me to work with some "industrial strength" servers.

Although I can test concepts on a laptop PC, the one thing I doubt I could do is to look at some aspects of data warehouse performance, and that is down to insufficient IO capacity, or perhaps more correctly bandwidth. Most laptops have a single hard disk, most laptops only have one network port, USB 2 is not really fast enough to run multiple external drives. Although I can plug a lot of disk into a laptop, I doubt if I can get good data throughput.

More on speed

Testing, or benchmarking, data warehouses for query performance is somewhat difficult, after all what is a reasonable query workload in either the number of concurrent queries or size of the dataset being manipulated? And when you do get a measure of performance, is it scalable?

Obviously, we can't extrapolate performance from

select sysdate from dual;

and use that to estimate the performance of a query to find the total number tubs of ice cream sold by all of the 7-11s in Seattle in August. But can we scale from the results for a single store or a single day? Even if the query optimiser decided to tackle a single day and a whole month in exactly the same way would be able to apply some simple (not necessarily linear) function to estimate the query time of 31 day's worth of data against one?

One factor that can "throw a spanner in the works" is when the almost ubiquitous sort operations beloved of data warehouses moves from PGA memory sort to disk sort and then, in the case of those systems on RAID 5 (and there a lot of them), when the disc write buffer becomes saturated and the write speed drops to the physical disk write speed, each step seeing a dramatic drop in performance.

And don't get me started on parallel queries that end up spending more time waiting than doing.