Quality thoughts as I continue to chill out
Over at another Blog, Beth notes that she is editor of the month for the Carnival of Data Quality. When It comes out I urge you to take a look. With those strange quirks of global IT companies Beth and I once worked for the same employer, but we have never met - I did see her photo once on the internet and if she donates to charity I'll tell her where!
So what has data quality got to do with BI? Everything, For a BI system to be successful you need to fulfil three objectives:
• It must tell people what they need to know - by that I mean it must encompass enough detail to have real use
• it must tell people what they need know rapidly enough
• and it must tell people the truth
The last item is something that has to be designed in from the beginning of a BI project; performance and scope can be enhanced later but quality can't, or not without having to replace already loaded data.
When I get involved on an ETL project I probably spend less than a quarter of my time on building ETL code and getting to it run, the majority of my time goes on data quality and finding and explaining anomalies. Where we can, we get data fixed at source, but sometimes that is not going to be possible; but data profiling allows us to formulate rules to handle the expected exceptions (whether these are auto-fix procedures or park it one side and let a human make the call rule) And then there is the unexpected exception, which of course must be handled.