mySQL As A Data Warehousing Platform?
Julian Ford passed on a link to an article by Michael J. Radwin on Buildng Data Warehouses With mySQL. The article refers to a presentation by John Ashenfelter who's also written a book on mySQL and data warehousing which is due out in July 2004.
Data warehousing using mySQL on one hand sounds counter-intuitive but also has a few things going for it. On the plus side, data warehouses don't need to be 24x7x365 available, and transaction integrity and support is less of an issue, which are usually seen as weak areas for mySQL. On the other hand, mySQL is unlikely to scale up to the terabyte-size data warehouses you often come across now, and it lacks features such as materialized views and query rewrite, star joins, bitmap indexes, upserts, views, subqueries, incremental or hot backups, stored procedures (for ETL) and so on. I'm reluctant to criticize mySQL but I'm not sure I'd recommend it for a data warehousing project.
From looking another mySQL and data warehousing presentation ("Large Volume Data Transformation and Warehousing Using MySQL" by a Swedish company called Lentus) it's interesting that they concentrate on using mySQL for the data staging and ETL process, rather than querying and analysis. This would fit in well with the strengths of mySQL (simple database, not overly complex, no need for complex transactional support) although Oracle again would score points through it's built in ETL functions such as streams, external tables, upserts, table functions and so on. Where you're going to run into problems is when you try to query a very large data set using mySQL - Oracle has a number of enhancements that make star transformations (queries run against star schemas) very efficient and has mechanisms for redirecting user queries from base tables to summary tables without user intervention.
Having said all that, there probably is a role for mySQL in data warehousing, most probably as a reporting database for other mySQL databases. If you're familiar with mySQL it's more a case of learning the data warehousing process and then applying it to your knowledge of mySQL, and in all likelihood your database isn't going to be all that large anyway - it's more about bringing data together and getting a 360 degree view of your organization rather than dealing with very large datasets. Interestingly and nonetheless, there's a bit of a cottage industry springing up around reporting and analysis around mySQL and open source, with a product called OLAP4All offering a relational OLAP solution with a mySQL backend, Mondrian being an open-source Java multidimensional OLAP engine that implements Microsoft's MDX query language, and JPivot, a JSP tag library for delivering Mondrian OLAP data over the web.
Has anyone used mySQL or Mondrian for data warehousing projects? If so, get in touch as I'd be interested in how things turned out.