DW Wisdom (2) - more of the physical

Peter Scott

Jun 23, 2006 • 3 min read

Keep all the OLTP tables separate from the DW systems: In the olden days, the database ran on a single machine with one or (probably) more processors and direct connection to the disk.

Enterprise storage and server clustering has moved things on a lot since then but there are still some good reasons not to run both transactional and DW systems on the same database instance, and apart from the fact that your OLTP vendor might not support it!

With an Oracle database the following characteristics differentiate DW and transactional systems:

DW	Transactional
Few, long running queries	Frequent, fast transactions
“Never the same query twice” – no binds	Same queries repeated over and over – binds used to boost performance
Extensive use of sorts, bitmap merge area, hash joins and other PGA structures	Sorts, bitmap indexes and hash joins less common – more focus on SGA sizing
Queries access multiple rows, full table scans and bitmap index reads common	Mainly access single rows, often by indexed reads.
Table scans favour larger block sizes	Default block sizes favoured
Bitmap indexes can boost query performance	Bitmap indexes can kill update performance
Right-sized parallelism is good	Parallel query is not so useful
May only need to recover to latest good backup and then reapply batch data loads – may not need to run in archive log mode	Must plan to be able to recover to the latest transaction as part of disaster recovery. Must have robust archive logging

But these conflicting requirements fit well with network storage and RAC. The OLTP system will run on its own set of set of servers optimised for fast and furious transaction rates and the DW will use another set of servers optimised for large queries.

Parallel is good: or perhaps more truthfully “some parallel is good, but not all of the time”. There is an overhead involved in starting a parallel query and further overhead in coordinating the results of the query, so if the query is already fast in serial mode there is nothing to gain and perhaps lots to lose by going parallel. A third factor in the speed of a parallel query is that typically you wait for the slowest parallel slave to complete its query; if that slave is slow because it has more work to do or because it is delayed by IO contention then the other slaves will sit there twiddling their metaphorical thumbs until all of the results are in and the query coordinator can crack on.

In my opinion parallel_adaptive_multi_user is not a good setting to use on a data warehouse. The idea that a user can get most of the systems resource when the user load is light and progressively less and less as more users use the system sounds very fair to the system and the people who paid for the processors (after all they would see all of the processors busy all of the time). But for most users, predictable query performance is far more important than the fact that this query takes between 1 minute and 1 hour depending on how lucky they are at getting enough slaves. And who says the more slaves the better? What if all of the slaves write to temp and your database is IO bound on temp.

RAID5: despite what storage vendors say, RAID5 is not good news for a data warehouse. Stripe and mirror, yes but use parity, no!

Some people believe that a DW is a read-only system whereas a transactional one is read/write or even write mostly. However DW systems are, in fact, heavy users of database writes; the batch processes around data loading are, of course, write-intensive and even during the period that users run queries there is often intensive write activity as large structures are written to temporary tablespaces as sorts, aggregations and hash joins take place. This intensive activity is more than capable of swamping the battery backed cache within storage arrays and will degrade performance whilst the disk catches up with the backlog of disk writes. This delay on disk write is especially annoying if it is the writing of temporary structures to disk (sorts etc) that is causing the problem.

Sign up for more like this.