Status changes over time
Curiously I am working on two very similar data architecture projects at the moment, both small scale data warehouses, and both in similar subject domains. Most of the data warehouses I get involved in the are effectively a series of snapshots of a set of measures for discrete time slices be it average stock holding in a week, sales per day, calls per month. These two, however, are more aligned to track events that happen to an item over time.
Both of the current projects are based on CRM processes that monitor the progression of customer queries from first contact to resolution. This type of requirement is amazingly common and can occur in any business sector where a case is raised and tracked, common examples include support desks, insurance claims and customer complaints systems; closely similar systems exist in engineering where (say) maintenance activities on aircraft components are tracked over time.
Suppose, the system was for my burger bar owning customer, and that they had 100 odd outlets. Somewhere in their corporate office there would be a team that handled complaints from customers that broke teeth on bones in the meat, burnt their mouths on hot food or fell on wet floors. The complaints staff would log the problem in there complaints management system and periodically up date the complaint as it progresses to resolution. On the way the status of the complaint will change through various states such as logged, under investigation, settled (and perhaps a myriad of other steps on the way). But for reporting we want to know things such as complaints logged per day per outlet, number of open cases at month end or total compensation paid per quarter.
Dimensionally modeling such things can get a little tricky - certain things such as outlet and date drop out quite easily in my burger chain, maybe we can also use server if we are trying to spot the accident prone or perhaps customer if we are looking for fraudsters. But how do we handle the 'case'? A single case has a lifespan, before it is first logged it does not exist in the data warehouse, after it is resolved it probably won't reappear (this may not be the case for IT support calls ;-) ) We could create a case dimension and have case status as a measure, but this does not lead to clean looking queries as we will almost invariably use a predicate on the fact measure. Or we could make case status a dimension, or make...
The more I think about this the more I am tending to favour measureless fact tables, that is just a skinny set of dimension keys (including case status) and key to join to the detail of the case for those whose interest is more in mining than reporting.