It’s a mater of integrity

In a data warehouse there are typically two ‘chunks’ of data – the public stuff (such as reference or dimensional data, raw fact data, and the aggregated fact data.) The other chunk is the ‘backroom’ layer that is used to build the public side of the warehouse.

For me, the backroom deals with ETL - applying business rules to the data and especially the enforcement of data integrity. I am firm believer in the principal of never trusting that the incoming interfaces to my systems are 100% reliable. I never assume that primary keys are indeed unique and not null (nor do I assume that the first duplicate (if you know what I mean) is the true value) neither do I assume that parent keys are always present. Data that fails these checks are rejected. For most of my customers that means ‘parked’ to one side and dealt with during office hours, on-call support do not want to investigate data problems at 3:00am!

Reference data are loaded into staging tables with integrity constraints disabled, table by table (starting at the highest hierarchy levels) constraints are enabled and any exceptions captured and the offending rows removed from the staging area. The stage tables are then compared against the published reference tables in the public layer (usually SQL minus) to produce a result set that just contains new or changed records. This in turn is published by an SQL merge query. I tend to use a different approach for fact data. Instead of using constraints to reject data the fact data is outer-joined to the reference data foreign keys. This is then used to drive a multi-table insert where invalid data goes to the reject table and valid data proceeds to publication (sometimes via partition exchange). This sounds hard to describe but is trivial to draw out in Oracle Warehouse Builder

By putting all the integrity checking in the staging area we can run a DW system with a enforced clean data in the published layer. This in turn allows us to use Oracle RELY constraints on any foreign keys in the reference or fact tables.