Data Warehouse health-checks

Like cars and people, a data warehouse can benefit from a check-up to keep things in top condition. Unlike cars and people data warehouses don't tend to have parts that wear out but DWs do age; either technology moves on and new, more efficient, techniques can utilised or the original design no longer matches the business requirements.

Although most health-checks I carry out are on DWs I did not design, I occasionally get invited to review "one I prepared earlier" to use a phrase beloved of TV cookery programmes. For me there is no difference between the two types of review - I still need to look at how things are configured, how clean is the data, how the data load works, the backup and recovery process and a myriad of other things (or as few things as you wish to pay me for to be strictly mercenary)

One recent review of a system I designed a few years back has lead to me to think that the documentation set should have also include a "Why" book; a guide for the people maintaining the data warehouse to explain the reasons behind why things were done a certain way. For example:

  • Why I wanted to use file based storage (not raw) with direct io to maximise performance. This needs OS and database settings set (or at least it does for this environment) but sometime in the last few years the direct io had been disabled.
  • Why partitioning fact data by day gives more potential for partition elimination than partitioning by week (range) with hash sub-partitions on product key (for this particular data set and query load)
  • Why I really wanted a large number of smaller tablespaces and not just one (very large one) to store all of the sales summaries
  • Why we needed to create dimension objects
  • ...and RELY constraints
  • Why we need to run our backup straight after the dataload (no archive log)
  • Why we can not host the query tool repository in the data warehouse (same reason)
  • Why we disable a bitmap index partition after truncating a table partition and not before
  • Why append parallel might waste space on small inserts.