Database performance

One of the medium sized DWH I run is to have all the aggregate summaries rebuilt in few weeks time. The customer is replacing the old mainframe with a new ERP system, and at the same time restructuring their reporting structures and hence asked that we change all of the existing product codes to the new values.
We have not rebuilt all of the summaries on this DWH since we migrated it from a 24 CPU sparc2 box with a barn load of 9gb disk to the new high speed Sun box on a SAN. Less CPUs but so much more grunt. I got one of my team to estimate the rebuild time base on the amount time the average daily batch takes... bad news it would take 7 days of 24x7 working, about twice as long as on the old box. So we then looked at the individual jobs in the batch. Three took far longer than the others (tens of minutes against tens of seconds) so as that was the most pain we targeted them. The longest running of these was the base daily sales summary, this table holds about 600,000,000 rows and is partitioned by day (812 partitions) and is built from the aggregates of the daily POS LINE and POS_HEADER tables (again 812 partitions each) joined to the customer, store, product, original supplier, product type and calendar dimension tables.
So we looked at the build method - truncate partition followed by append parallel insert. So that looked good. Then the SQL for the insert... gotcha! - the common column for the POS LINE and POS HEADER is the document number and that was in the join and we joined on date to the header date. But what was missing was the join of date to the POS LINE. So instead of finding a document number in a single partition we scanned the whole .6 billion row source table for each document.... YUK!
Build time now sub 3 minutes, saving 12 minutes - moral if you join partitioned tables make sure that you join to partition key for each table
The other tables would not have the same problem as we were building summaries over our base summary. The seemed nothing fundamentally wrong with our code so we change the way we built the summary. Instead of the above truncate partition followed by append parallel insert we moved to. Create intermediate table in same tablespace as partition as select * where 1=2 from the summary. Insert append parallel into intermediate table, partition exchange, rebuild the indexes and we are done... saving 4 minutes.
Now to work out why exchange is quicker than insert for some summaries but not all. But having targeted the worst offenders we can get the job done in 2.5 days.