Poor queries

One of the things I do on the DWHs I manage is to periodically review the nature of queries being run so that we can come up with performance improvement plans. When a data warehouse is put in the designer (me) guesses the aggregates and indexes that will be needed to give the required performance, after a few months of user activity (less if I get things grossly wrong) it is time to look at what is being asked and where the pain is in getting those answers. Sometimes there is the need to add extra aggregates and indexes, sometimes we need to remove summaries that are expensive to build but never or seldom get used. This of course in an ongoing process, users’ requirements change overtime as business needs move on.

On Friday I was scanning active queries on one customer’s DWH and spotted the query from hell being run against the raw fact tables. These tables store every single sales transaction by item, customer and store for the last 116 weeks. Most users can not even see these tables – the tables are seldom used (except as a source for aggregations) and are not indexed, but they are range partitioned by day. A user was searching the whole of sales history for any customer at any store that spent exactly £1234.56 (amount changed to protect the guilty) in a single transaction. Even if the table was indexed this query would never fly; it would require half a billion rows to be summed up (grouped by date, customer, store and invoice number) and looking for a specific total spend.

Suggestions welcome on how to make this type of query fly….

By the way I found the user responsible (an IT department worker) they apologised for being terminally blonde.