More Notes on Right-Time BI
Over the past couple of years Stewart Bryson and I have been looking into things "right time" (or is that realtime?). It is great to have him around to trade ideas (and graphics for presentations!). Most of what we have been discussing has been about "traditional reporting", either with or without a data warehouse, and definitely in the realms of "how well have we done". However, that is not the sole use case for right time BI.
I have long felt that BI is only done for one of three reasons - the law says we must report things, it saves us money, or it makes us money; so if knowing something sooner gives us competitive advantage then surely that is a good thing. Knowing sooner is not enough though; it is also about being able to act on the information to facilitate a change in the organization that enhances return (or lowers costs). To my mind we are moving from the traditional "let's look at this in aggregate" stance to a world where we ask "what is the significance of this newly observed fact". This type of analysis requires a body of data to create a reference model and access to smart statistical tools to allow us to make judgments based on probabilities. Making such decisions based on dynamic events is not just for stock markets and bankers, the same principles apply in many sectors. I know of some restaurant chains that have investigated using centrally monitored sales across all outlets to dynamically adjust staff levels based on likely demand - staff are sent home, brought in, moved between outlets based on a predictive model that uses past trading patterns across many outlets.
As usual, most of the building blocks we need to do this are available to us, we just need a bit of creativity to join them together into an architecture. For this kind of use I feel that messaging should be core to the data capture - we want to look at single items of "fact" and do some statistical analysis on them before adding them to the data warehouse (or what ever form our data repository takes) so that the new fact can become part of the base data set we use to analyze the next fact to arrive. Micro-batch loading of log based change data is probably less suited here as we are:
- adding to the latency by using discrete loads at fixed intervals and
- the processing of many items at a time complicates the statistical analysis and alerting phases (after all if we get 2453 credit card transactions in a batch only a few will be potentially fraudulent).