Data Warehouse Healthchecks and OWB training
Rather strangely as Mark jets off to Greece for a weeks holiday, I am also off there for a few days work (he's here next week working as well).
Looking back over the first few months of the company a lot of the work we have been doing has been either Data Warehouse healthchecks or OWB training.
The healthchecks have been for a number of different companies, each with a different Data Warehouse set up. A packaged Data Warehouse solution that was neither loading or querying in a performant way, another Data Warehouse designed to integrate data from a number of different subsidiaries, each with different ERP systems holding their data in different structures, others just not hitting the mark with load and query times.
The problems are generally:
- Not loading in the required timeframe
- There is no quality in the confidence of the data
- Query response time is too slow
The process we run through needs to be tailored for each organisation, unfortunately there is no 'one size fits all' approach to this, however one of the objectives over the next few months is to try and formalise this process and create/publish a methodology or approach we use for this. This would go further than a database tuning exercise, and would encompass areas like:
- Verification/understanding of the business requirements (do they want and need a Data Warehouse?)
- Review the data model (is there a design for this piece of 'software'?)
- Review the ETL process (will it really load in 3 hours, do they really mean real-time?)
- Data quality (data what?)
- Review the database (do the Data Warehouse features used approximate to 7.3.4 or 10gR2?)
- Aggregation strategy (how much work is the database doing to answer aggregated queries)
So what have we found - well, there may be a number of reasons why things may not be well on the good ship Data Warehouse. Warehouses tend to grow organically, responding to user requirements, organisational changes and merger/acquisitions when needed.
This can dilute the design, stretch the ETL process and invalidate some of the fundamental design decisions taken about the Warehouse. The use of standardised design patterns and a high level data model can help rectify this. Although sometimes seen as an over-simplification, it can provide a basis for putting the Data Warehouse back on track. A high level design can work as a medium that all the stakeholders can work round.
Another important issue is how the business works with the IT team. The development can be led from either the IT department or the business. If it lead from an IT side there is often an understanding of the kinds of requirements it is likely to meet, but there is also an aspect of the Field of Dreams 'build it and they will come'. More often the requirements are driven by the business. This makes it easier to establish business
sponsorship and ownership, and to be honest more likely to succeed. One of the problems is establishing expectations for business input, its not just a case of them providing some requirements and then walking away. One method that has proved helpful is
to use contract-like documents such a Acceptance Criteria or Service Level Agreement to set expectations across the whole project.
The bottom line is Data Warehouse projects must be run in the same way as other Software Development projects. If there are no clear requirements the project may well drift and a lot rework could be required. If there is no strong configuration management
process used then you can't be sure of what is being release, and when. If no design is employed, you cannot guarantee the user requirements are met, or that your ETL or query processes are efficient. If there is insufficient testing then users will not have confidence in the data. If you follow standard Software Development techniques, however traditional or agile, you will have a better chance of success.
So what next? As I mentioned above we are looking to further formalise the process and enhance our methodology. I am also submitting a couple of papers for the UKOUG, one talking around some of the Data Warehousing issues mentioned above and another focusing of some OWB best practices and advanced techniques.
So to conclude, does anyone know why you can't use laptops and iPods on aeroplanes during take-off and landing? Will it make them crash, should we be worried, are they not telling us something?