Rapid development using OBI EE (and what about the Data Warehouse?)
We are often told that one of the benefit of OBI EE is the speed and ease of development. Data sources can easily be added into the system, users can then quickly build queries and the results are easy to distribute. While I completely support this, to me this leaves a few questions when you go beyond the slick sales demos: do we need a Data Warehouse? How do we deal with data quality? How do we test? How do we ensure our great looking reports get from our development environment to our production environment and still display the same data?
<p>This posting just concentrates on the Data Warehouse. Think of the project if we didn’t need a Data Warehouse, initially in terms of cost: no database license required, no <span class="caps">ETL</span>/Data Warehouse development required, no ongoing maintenance and maybe in the eyes of the business no black hole of money and time where the hairy developers go away, grumble about data quality and take longer than everyone thought. Think of all this new way from a business point of view: there is a new reporting requirement, they sit down with a business analyst, add the data source into the physical and then business model, graphically create the joins, and 10 minutes later the data is on the <span class="caps">CEO</span>’s dashboard, marvelous, and not a hairy developer type to be found anywhere in the process, so no cost and no hold-ups. But wait a minute: why have we been spending money developing Data Warehouses over the past couple of decades?</p>
<p><em>What was the Data Warehouse actually doing?</em></p>
<p>For starters, they prevented queries being run against the live transactional systems. Query and analysis tools can be very powerful, and hence can generate complex queries. If each of these queries was being run against the live systems then performance would be impacted, plus the reporting system would be dependent on all the systems being live all the time. Thus loading all the data into a ‘reporting database’ protected the transactional systems.</p>
<p>But what about our real-time Data Warehouse, managers need up to the minute information to make informed decisions? Real-time Data Warehouses can be a double-edged sword, it is great to have up the minute information, but the downside is that reports keep on changing, it becomes more difficult to reconcile information or to get a consistent view of the organisation. Sometimes it is actually useful to have a static view of the data for 24 hours. Real-time Data Warehousing can also be implemented using replication features like Change Data Capture in the database if required.</p>
Data Warehouses also offer a number of functional advantages:
- they can store historic data that may have been archived from the transactional system.
- they can store the history of dimensions, so facts can be correctly categorised when they happened.
- they can store data from different systems.
- they can store the data in a way that makes it easy and efficient for users to query, this means that a large volume of data can be accessed and used effectively.
- they can use features like partitioning, bitmap indexes, materialized views to further speed up those queries.
<p><em>What were those hairy developers actually doing?</em></p>
<p>They are writing <span class="caps">ETL</span> programs (or using a tool to) transforming the data from a number of disparate systems, and possibly a number of disparate organisations so that it can be presented in a unified model. This process can be easy for a small young company, however for larger company, with a history of acquisition this is far more complex. The reason this process can take such a long time is that it can be very hard to combine data from different systems. In the world of <span class="caps">OBI EE</span> we would create a join between the different tables – this is great in theory, but only really works if the keys are aligned between the systems. Generally they are not. Add to this the fact that the data quality maybe of poor quality and suddenly you find that the join is ‘losing’ a large number of records, or they are being incorrectly categorized.</p>
<p>The <span class="caps">ETL</span> process, and the hairy developers, will address this process and actually get in a state it can be compared across systems. I have worked on projects where the <span class="caps">ETL</span> team have lead/driven the business in the whole data quality and cleansing process.</p>
<p>So to turn the argument on its head, if we can do all this with the <span class="caps">ETL</span> process and in our Data Warehouse, and this does add value, why are we hacking things together in <span class="caps">OBI EE</span>, surely then everything should be done at the back end? Again there is some truth in this, if we are going to define complex time series calculations, and we can do them during the <span class="caps">ETL</span> phase, then it supports the ‘one version of the truth’, it means everyone will be using the same calculation and it can be rigorously tested.</p>
<p>If we had no constraints on time or money we may do everything at the back-end, or in the <span class="caps">ETL</span>, however the reality of it is that there just isn’t time (or money), or we often don’t know what the requirements are until later in the project. If we have written the <span class="caps">ETL</span> code and it is tested and working correctly and the business decides it needs another calculation then it may be a lot more practical to write it in situ in the report, or in the business model than to rework (and hence retest) the <span class="caps">ETL</span> code.</p>
<p>This is where <span class="caps">OBI EE</span> and an iterative development methodology come in. In a typical BI/DW project requirements are initially stated at the outset of the project, then they are then added to, changed and refined throughout the process. Users do not know what they want to report against until they see the data in a report in front of them – it is only then can they see the possibilities. To take the example of the calculation and how this could be implemented using this methodology and <span class="caps">OBI EE</span>: initially it could be prototyped directly in Answers. It could then be reviewed, refined and tested. Once agreed the calculation could be added to the business model, this way it would be available to all the users and they could add it to reports, this may lead to changes, or finding cases where it doesn’t work correctly. If the calculation becomes a core part of the system, is found to be stable and performance could be improved by materializing it, then it may be added to the Data Warehouse and hence the <span class="caps">ETL</span> code. We find that as the requirement becomes more stable it filters from front-end tools back into the database Data Warehouse.</p>
<p>This gives us the foundation of our methodology for approaching projects. It is initially important to gather and then implement/prototype the core or high priority requirements in the first few iterations of development, our understanding of these requirements only needs to be ‘good enough’ at this point and it is important to get them in front of the users. <span class="caps">OBI EE</span> can be used to do this. As they are then reviewed and refined they can be incorporated further into the system.</p>
<p>Once we requirements reach a certain stability they can be passed back toward the back-end of the system on future development iterations. This means that the more complex coding tasks, the ones that take time can be carried out on better understood requirements. This is where the Data Warehouse fits in and where it still can and does add value.</p>
<p>Importantly <span class="caps">OBI EE</span> gives us the ability to handle these changes seamlessly, we can change the physical objects supporting the business model. Mark wrote a series of postings about this process entitled <a href="http://www.rittmanmead.com/blog/2008/03/16/re-wiring-obiee-logical-models-to-use-a-data-warehouse-part-1/">Migrating <span class="caps">OBIEE</span> Logical Models to use a Data Warehouse</a>, these describe how to re-wire the business model to a Data Warehouse.</p>
<p>So this means that old school ways, such as Data Warehousing, are not dead, interesting the latest <a href="http://www.ralphkimball.com/html/designtips.html">Kimball Design Tip</a> (I think you have to subscribe to get access – it’s free) discusses how names of the concepts of what we now call BI&DW have changed. What will make a project successful is finding a balance between how to optimally use each of the technologies, depending on user requirements, complexity and quality of source data and a myriad of other factors, and most importantly having a methodology that can stitch these approaches together.</p>