Last time I mentioned in passing...
... our use of an index organized table in our ETL process and I promised to jot a quick note on it. So here goes.
There are various combinations of database functionality that I do not like mixing in a data warehouse. For instance I impose rigorous RI checking in the staging layer of my typical three layer design and only publish clean data to the middle and thence the third tiers. I do however use NOT NULL and RELY constraints extensively in the published layers so that the cost base optimiser knows as much about my data set as possible; and besides without these constraints query rewrite will not work fully.
I usually take one of two approaches on the stage layer RI check, for dimension hierarchies on reference load I use plain old foreign keys and capture any exceptions as part of load process. But for fact load where I have 7 or more dimensions to validate I take a slightly different tack and use outer joins on the dimension keys and look for nulls. This sounds slightly complex but it is a piece of cake in OWB to write a map like that and by using a splitter in the map to multi-table insert the good stuff into a load table and the bad stuff (together with reject reasons and date stamps) into a reject table we get good performance.
But back to the IOT point. We had to implement a business rule that if we received the same invoice number twice we had to reject the record. Now, using an enforced unique constraint do this has a couple of major problems with my design - I would need a real index on my fact table and as this is partitioned we we need to have a global (that is, not local) index to support it; and I find that global unique indexes on partitioned DW table just don't mix. Step in the IOT; we define a two column IOT to contain the document number as a primary key and a datestamp for use later when we housekeep old records from the data warehouse. We add our IOT to the dataload map as an outer join and reject records where the incoming invoice number matches an existing invoice. At the end of the load we append all of the new good invoice numbers into the IOT ready for the next batch. Not elegant, but fast enough to use, and more importantly it works.
Changing the subject: Doug Burns is commenting on the quality of DBA candidates for a post at his current employer. Buried in the post is a great interview question on cursors. Now I would love to ask people about illicit cursors ;-)