An Interview with Margy Ross, Kimball Group
One of the events we're helping to organize later in the year, is the Kimball University "Data Warehouse Lifecycle In-Depth" event running in May, in London. This is an excellent chance to learn the fundamentals of the Kimball methodology directly from the writers of the book.
As part of setting up the event, Margy Ross, President of Kimball Group, agreed to do an interview with us. As this was such a good opportunity, I also took the opportunity to ask Peter Scott, David Aldridge, Aaron Merry and Joe Leva what they might want to ask Margy, Ralph and the rest of the Kimball University team, as I know like me they're interested in data warehousing methodology. Here's the questions and the resulting interview:
[Aaron Merry, Bicon] : "What do you feel are the biggest challenge IT managers face when implementing a data warehouse? What advice would you give a prospective data warehouse manager when walking into their manager's office and making a case for a data warehouse?"
[Margy Ross] : "The biggest challenge is effectively partnering with the business organizations to deliver a DW/BI solution that the business will embrace to support their decision making. Often what undermines the partnership is the IT manager’s lack of understanding of how the business works and the major opportunities for leveraging business intelligence to add value. In terms of advice for making a case for data warehousing, it’s critical that the DW manager walks in with their business counterparts to collectively establish the business case."
[David Aldridge] : "What do you find are the most common mistakes made in dimensional warehouse design, and data warehousing in general?"
[Margy Ross] : "The most common mistake made in data warehousing is remaining overly focused on technological solutions, rather than focusing on the needs of the business. In the area of dimensional modeling, a fundamental error is failing to acknowledge that data needs to be structured differently for data capture versus data access, although more and more organizations are readily reaching that conclusion; once a commitment is made to dimensional modeling, then the common mistakes are failing to deliver the details, along with failing to focus on the importance of common, master dimension tables."
[Mark Rittman] : "There has been a lot of discussion over the past few years around active, or real-time data warehousing. Do you think such an approach works with an aggregated, dimensional data model?"
[Margy Ross] :"The real problem at the heart of this question is the reason we started building data warehouses in the first place: it was impossible to answer analytic questions with transactional data structures. Real-time data warehousing is usually an attempt to do the opposite: answer transactional questions with analytic data structures. The issue is how do you properly cleanse the data, integrate it and align it across the enterprise, and keep track of changing attributes all on a real time basis. In most cases, you have to create separate, parallel structures that support real time loading and querying, along with cleansed, aligned, detailed history to provide an analytic context. Finally, data is sometimes aggregated in dimensional models for query performance, however it’s an unfortunate misunderstanding that all dimensional data models contain summary data. Aggregated data sets should always complement the more fundamental dimensional models containing low level, bedrock atomic details."
[David Aldridge] : "Do you think different design approaches apply for relational and multi-dimensional OLAP systems?"
[Margy Ross] : "Whether you’re designing dimensional models for relational star schemas or multi-dimensional OLAP cubes, the design approach remains the same. The designer needs to clearly articulate the business process or event generating the performance metrics, the grain or level of detail at which the metrics are captured, the dimensions associated with the metrics, and then the metrics themselves."
[Peter Scott] : "How much do you feel that regulation and audit rules such as Sarbanes-Oxley and data privacy are acting against the concept of enterprise-wide conformed dimensions? Do you think regulation is perhaps forcing the move back to isolated data marts and information silos, as nobody is supposed to see data outside their need?"
[Margy Ross] : "Isolated data marts and silos are NOT the answer! These regulatory moves provide more motivation to establish robust data governance and stewardship responsibilities, staffed by knowledgeable subject matter experts from the business, to ensure that appropriate conventions, access rights, and change tracking policies are established and enforced. It’s possible to preserve privacy and still provide a robust analytic environment; it just takes a little more work in the ETL system."
[Joe Leva] : "Rittman Mead are pleased to be sponsoring your Data Warehouse Lifecycle In-Depth class in London in May. For how much longer do you expect to personally teach your classes, and what motivates you to continue teaching, writing and further developing the dimensional modeling approach?"
[Margy Ross] : "As a group, we remain committed to focusing on the DW/BI space; we have no plans to stop teaching, consulting and writing anytime soon. Our commitment is unwavering for many of the same reasons that caused us to initially focus on this market 25 years ago. DW/BI remains an exciting opportunity for IT professionals: it requires you to really understand what makes the business tick, gives you unprecedented access to senior business executives, and allows you build solutions in partnership with the business that deliver true business value. Of course, these opportunities are often accompanied by associated challenges which keep it interesting for all of us."
Thanks Margy, and we look forward to seeing you and the rest of the Kimball University team in London in May. If you're interested in coming along to the event, details are on our website and registrations are now open. This is a "must" for all data warehouse data modelers.