Time Series Revisited
One of the perverse things about BI is that what sounds difficult is often easy and what sounds simple is hard. If someone asked me how many bunches of flowers were sold on Mother’s day 2006 or how many people bought both ‘item A’ and ‘item B’ in Seattle on 13 June 2006 there is a reasonable chance (assuming that the data is in my DW subject domain!) that I would be able to answer the question quickly. But ask me how much money we have taken this fiscal year and quick (or simple) answers become a bit more elusive.
The problem with year to date (YTD) measures is that potentially they need access a large amount of data to compute the result: for a DW with three year’s worth of sales history YTD would worse case access 1/3 of the data to calculate YTD sales and if you compare year-on-year YTD performance you would hit almost 67% of the data. Of course the query involved is simple (it’s just a matter of supplying all of the dates to aggregate over) but is from the user point of view frustrating in that the query time increases as the year progresses.
A common strategy used to deal with this problem is to store YTD values along with the sales aggregates in the data warehouse. This is often acceptable for high degrees of aggregation (total company, region or sales channel) but rapidly becomes unworkable as the dimensions become sparser. For example, if you record YTD by product and customer then that sale of a diary on 1 January has to be maintained as part of the YDT value right through to the 31 December, but how likely is that customer to buy more diaries through the year? And what happens if there is a problem with communications with a store and their sales data arrives a month late? How many YTD values need to be recalculated?
As I said above, calculating YTD in SQL is basically simple, we need to convert the single query date into a range of dates and aggregate over that. The challenge is to do this is a way that reporting tools can use and without preventing features such as query rewrite from working.