You Probably Need Parallel Except When You Don't
My last post looked at some aspects of performance tuning that would make traditional DBAs blanche with horror, not once did I mention indexing, partitioning, PCTFREE or parallel. This was not because I don't consider them important, but more that the simple test of "am I doing the right thing" is often overlooked. To appease the DBA in me I will write occasional blog posts on some database aspects and performance. Here's the first.
If you are running a large Oracle data warehouse you should be using parallel. There I said it! I think this fundamentally to be true, especially so with engineered systems such as Exadata. As we know, Exadata is a "balanced system"; it is designed to perform well if all components of the system are used in balance, and parallel can certainly exercise hardware resources which are otherwise idle in a large serial query.
Like all tools you have to use parallel correctly; no more would we think of using a wrench to hammer a nail then should you think parallel is the answer to all performance problems. Sometimes parallel will make things worse, sometimes parallel will make performance less predictable.
Parallel introduces additional work to a query, simplistically we need to: split the query into multiple parallel processes, execute them, wait for the processes to complete and finally coordinate the results. This all takes time to do. Our time saving comes from being able to process multiple smaller chunks of data simultaneously. If the time to execute the step in parallel is not significantly faster than doing it without parallel then the additional overhead may make parallel processing a slower option; this is typically the case with small tables where a full tablescan or an indexed access is fast. Use too few parallel processes and we will not gain much in performance, too many and we risk starving the database of resource for other work or even slow our own process as it waits for resource. If you have implemented some form of CPU resource management on your system you may find that you experience delays as your parallel slaves 'wait their turn'
Having worked for a computer services company in the past (I managed support and development for several large data warehouses) I got to know that "predictability = good" and "unpredictability = bad". People expect that a report takes 20 seconds to run or the overnight ETL processes will complete by 07:30 every morning. People don't like that 20 second report taking between 2 seconds or an hour depending on what else is going on; of course they always will want the report taking 2 seconds, but that is another problem. Adaptive parallel is one of those features that sounds a good idea until you try to implement a production service with service level or operational agreements with the users; the idea that the database allocates parallel process based on available processing capacity would be good except for two flaws: once a query starts it can't make use of additional parallel resources released when other parallel queries complete and the query will execute even if there is not enough parallel resource to do so - in the worst case this means it will run serially and likely very slowly.
Oracle improved on this a lot in 11gR2 with Automatic Degree of Parallism, AUTO DOP. Gwen Shapira wrote a nice piece on AUTO DOP a while back. This feature also introduces parallel statement queuing, that is, if a query can't access the full degree of parallel it needs it waits until it can obtain the required resource. This does not completely solve the problem of variable response time but it should avoid queries degrading to a lower degree of parallel because of insufficient available parallel resource; waiting five minutes and then running in 10 seconds is much better to my thinking than running in two hours. It is possible to avoid being queued by using a query hint (NO_STMT_QUEUING) , however we then run the risk of having reduced resource for execution unless we reserve sufficient headway in the system by setting parallel_server_target to an appropriate value.
From my experience of AUTO DOP, DOP can be too aggressive for ETL DML processes, of course there are a lot of things we can do to reduce the amount of parallelism requested, but perhaps the best approach is to reserve AUTO DOP for user queries. This is amazingly simple to do as AUTO DOP can be set at SESSION LEVEL. In my ideal world, ETL runs with traditional hand-tuned set-based SQL using explicit parallel and the reporting users get AUTO DOP with its query queuing to give the most throughput. If ETL runs at the same time as reporting (more and more common these days) we should lower the parallel_server_target to ensure we have enough resource for the ETL to run without degradation.