OBIEE 11.1.1.9 Now Supports HiveServer2 and Cloudera Impala
As you all probably know I’m a big fan of Oracle’s BI and Big Data products, but something I’ve been critical of is OBIEE11g’s lack of support for HiveServer2 connections to Hadoop clusters. OBIEE 11.1.1.7 supported Hive connections using the older HiveServer1 protocol, but recent versions of Cloudera CDH4 and CDH5 use the HiveServer2 protocol by default and OBIEE 11.1.1.7 wouldn’t connect to them; not unless you switched to the Windows version of OBIEE and used the Cloudera ODBC drivers instead, which worked but weren’t supported by Oracle.
OBIEE 11.1.1.9 addresses this issue by shipping more recent DataDirect ODBC drivers for Hive, that are compatible with the HiveServer2 protocol used by CDH4 and CDH5 (check out this other article by Robin on general new features in 11.1.1.9). Oracle only really support Hive connectivity for Linux installs of OBIEE, and the Linux version of OBIEE 11.1.1.9 comes with the DataDirect ODBC drivers already installed and configured for use, all you have to do then is set up the ODBC connection in the odbc.ini file on Linux and install the Cloudera Hive ODBC drivers on your Windows workstation for the Admin too (the Hive ODBC drivers that Oracle supply on MOS still look like the old HIveServer1 version, though I could be wrong). To check that it all worked on this new 11.1.1.9 version of OBIEE11g I therefore downloaded and installed the Windows Cloudera Hive ODBC drivers and set up the System DSN like this:
and set up a corresponding entry in the Linux OBIEE 11.1.1.9’s odbc.ini file, like this:
with the key thing being to make sure you have matching DSN names on both the Windows workstation (for the Admin tool initial datasource setup and table metadata import) and the Linux server (for the actual online connection to Hive from the BI Server, and subsequent data retrieval). One thing I did notice was that whilst I could connect to the Hive database server and set up the connection in the Admin tool, I couldn’t view any Hive tables and had to manually create them myself in the RPD Physical Layer - this could just be a quirk on my workstation install though so I wouldn’t read too much into it. Checking connectivity in the Admin tool then showed it connecting properly and retrieving data from Hive on the Hadoop cluster. I didn’t test Kerberos-authentication connections but I’m assuming it’d work, as the previous version of OBIEE 11.1.1.7 on Linux just failed at this point anyway. The docs are here if you’d like to look into any more details, or check the full set of setup steps.
For Cloudera Impala connections, you’re directed in the docs to download the Windows Cloudera Impala ODBC drivers as Oracle don’t even ship them on MOS, but again the Linux install of OBIEE 11.1.1.9 comes with DataDirect Impala drivers that are already setup and ready for use (note that if you upgrade from 11.1.1.7 to 11.1.1.9 rather than do the fresh install that I did for testing purposes, you’ll need to edit the opmn.xml file to register these updated DataDirect drivers). Then it’s a case of setting the Windows System DSN up for the initial metadata import, like this:
then creating a corresponding entry in the Linux server’s odbc.ini file, like this:
Note that the docs do mention the issue with earlier versions of Impala where the Impala server is expecting LIMIT clauses when using ORDER BY in Impala SQL queries, and gives a couple of workarounds to fix the issue and stop Impala expecting this clause; for more recent (CDH5+) versions of Impala this requirement is in-fact lifted and you can connect-to and use Impala without needing to make the configuration change mentioned in the doc (or use the workaround I mentioned in this earlier blog post). Checking connectivity in the Admin tool then shows the connection is making its way through OK, from the Windows environment to the Linux server’s ODBC connection:
and creating a quick report shows data returned as expected, and considerably quicker than with Hive.
As I said, I’ve not really tested either of these two connections using Kerberos or any edge-case setups, but connectivity seems to be working and we’re now in a position where OBIEE11g can properly connect to both Hive, and Impala, on recent CDH installs and of course the Oracle Big Data Appliance. Good stuff, now what about Spark SQL or ElasticSearch..?