SAP’s New BI Accelerator

In my Log Buffer posting at the end of last week I made a passing reference to SAP's new BI Accelerator, and made the flippant comment "Yeah, right..," about the claim that it did away with the need to tune or pre-summarize your data warehouse. The article did however pique my interest, and having had a look at the technology more, it is actually rather interesting.

If you have a look around the 'net, there were in fact several news reports about the product when it was launched a couple of weeks ago. Some of them are a bit hyperbolic - "SAP's Agassi Confirms Oracle Killer Plans, Dishes on SOA" for example - but what SAP are actually offering looks fairly intriguing, a data warehouse appliance that has many of the features found in current Oracle offerings - the OLAP Option, TimesTen in-memory database, bitmap indexes and IOTs, but wrapped up in an offering squarely aimed at their customers running SAP Business Warehouse.

What the product appears to be is an appliance - combination of hardware and software aimed at performing a particular, defined task - to speed up the queries performed by SAP Business Warehouse, SAP's add-on to their ERP suite that performs ODS-style queries against their data (SAP BW is due to be a supported datasource in the upcoming Maui release of Oracle BI Suite Enterprise Edition). SAP BI Accelerator is delivered either on IBM or HP blade servers containing Intel Xeon "Woodcrest" 64-bit CPUs, the Linux Operating system and the BI Accelerator application. It's "hot pluggable" into the SAP Netweaver BI architecture, and when plugged in, starts to cache queries and improve the performance of ad-hoc BI queries.

The way the accelerator works is to capture the data requested by the Netweaver-based application, parse it and tokenize it, and store it in a column-based data store in a similar way to Sybase IQ and SAND. Like Sybase IQ, it achieves impressive levels of data compression due to the column-based nature of it's storage (blocks contain columns of data, not rows, sorted and tokenized making it easy to compress data of low cardinality), sorted and ordered and then stored in a way where the data is effectively the index. This is then horizontally partitioned across multiple blade servers in a similar way to Oracle RAC and ASM (shared data storage, with fact data processed across all blade nodes but dimensions having a particular node affinity), and then loaded into RAM to create an in-memory database. Apart from the reduced need for disk storage, column-based databases can work particularly well with data warehouses as a particular query, requesting just a small subset of the measures in a fact table, will request a much smaller amount of data from the cache as blocks contain just the data for a particular column, not the entire row with all the other measures that you don't really need.

This architecture diagram from the Database Research Project in Germany shows how the product works under the covers:

SAP BI Architecture diagram

SAP have contracted Winter Corporation to produce some benchmarks which look fairly impressive, and certainly other people who've worked with in-memory and column-based databases seem to be pretty impressed with the technology. I'm not so sure about the more wilder claims about the technology - that it can do away with the need for a relational database completely for SAP customers - column-based databases might work well for decision support but they're a terrible choice for OLTP applications, and in terms of in-memory databases, what happens when you pull the plug out? You've got to persist the data somewhere - but it certainly looks like an interesting technology, and one that no doubt SAP will roll-out when in a competitive situation with Oracle when it comes to selling ERP suites with built in analytics. Where it falls short, I'd say, compared to products such as the OLAP Option is probably around calculations, time-series analysis, multi-dimensional queries and so on, but as a competitor to summarization technologies such as materialized views, OLAP servers and even third-party products such as Hyperroll, and as a way of getting fast access to detail-level data, it looks interesting. Watch this space, as they say.