Diagnostics, Logging and the OBIEE 11g EM Log Viewer
Earlier this week I took a closer look at Enterprise Manager within OBIEE 11g, and in particular the MBeans that Enterprise Manager uses to perform its administration functions. Today I wanted to take a look at the logging that's available within Enterprise Manager, together with the Log Viewer Enterprise Manager page that lets you search through all of the log files across all the nodes in the Oracle BI Domain.
Under the covers, logging and diagnostics are covered by two technologies in OBIEE; Oracle DMS (Dynamic Monitoring Service), which provides counters that can be accessed by the various JMX MBeans that I talked about earlier in the week, and ODL (Oracle Diagnostic Logging), a framework that takes server log files and converts them to XML so that they can be searched and parsed. DMS and ODL have been around since the time of OC4J (DMS was, I think, the source of the counters that you could query in OBIEE 10g), and the system components within OBIEE still generate the same nqquery.log, nqserver.log and other files that are processed by ODL and made available for analysis in Enterprise Manager.
Oracle's Mike Durran is planning on giving a presentation at our BI Forum in May on accessing the DMS data via MBeans and persisting it in a relational database, but the area I'll be covering at this event is logging and use of the Log File Viewer in Enterprise Manager.
You can get to the Log File Viewer by selecting Capacity Management > Diagnostics within Enterprise Manager. This page first displays the most recent errors and warnings across the Oracle BI Domain, and also has a tab for setting the maximum size of logs, when they rotate and so on.
If you scroll down, you see direct links to the most important logs (not the query log though, more on this later), and to the Log Viewer utility itself.
Clicking on, for example, the Presentation Services Log entry, brings up the Log Viewer with this particular log being displayed. Through ODL, the log view aggregates the Presentation Server logs across all notes in the BI Cluster, which is essential if you're trying to track down an issue but you've scaled out or scaled up your Oracle BI Domain and therefore don't know for sure which particular Presentation Server ran your analysis.
Something that's not immediately obvious, particularly if you've got a small screen, is that there's a "detail" area under the listing of log messages that you can display by dragging up a divider. The screenshot below shows this dividing line being pulled up, and more details being displayed about a particular warning.
Notice also the ECID hyperlink in this detailed area? This is the Execution Context ID, and it's something new that's been introduced with OBIEE 11g to help administrators link together entries in the various log files that relate to a particular administration or query "transaction".
So how does OBIEE 11g logging, and the new Log Viewer, work in practice? Let's take a couple of examples of errors that can occur, and see how easy it is to diagnose the root cause using OBIEE 11g.
The first one we'll simulate is around data source availability. Often on a system you'll get messages from users saying "the database is down" or "the dashboard isn't working", and if they send you a screenshot, it's something fairly unhelpful like this:
So unless they've still got the analysis open and they can press the "+" button to show the ODBC error, you've got to trawl through the logs and find out what's gone wrong. On a single BI Server, single node system, and assuming you've got access to the filesystem, you can bring up Windows Explorer and have a look in one of the log files, but assuming you've got a clustered system and you don't have direct access to the filesystem, the web-based Log Viewer in Enterprise Manager should come in handy.
Navigating to the View / Search Log Files area of the Diagnostics page on Enterprise Manager, I locate the Server Log entry and click on it.
This brings up the Log Viewer, and displays the most recent entries in the server log (stored in the nqserver.log file).
So far I can see the message [nQSError : 17014] Could not connect to Oracle database, and then if I expand the detail level below it, I can see what the problem is - the account is locked.
I then click on the ECID link to see what other client actions were involved in this transaction. I can then see from the list of other errors and warnings tagged with this ECID that there were five attempts by the BI Server to access the database here.
One thing I'll be covering in my May Masterclass session is how far this ECID goes through the query process; for example, does it tie together entries in the FMW and WLS logs with what's in the query log, and can we trace this through to other layers in the stack?
For now though, so far so good. So let's take another example where a common administration error occurs; for some reason, the BI Server won't start after a restart.
Now this error message handily gives us a View Log Messages button within the error dialog, which takes you directly to the Log Viewer, across all logs, with the focus on the ECID for the transaction that's caused the error. Let's take a look at what the log viewer shows us:
OK, lots of Java stack traces and messages about failed operations and methods. All of the entries shown in the log viewer are for the same ECID, so I look at some other entries:
If you followed the posting earlier in the week about the various admin MBeans, what's happening here is that various MBeans methods are failing, as Enterprise Manager tries to take the new RPD online but can't, for some reason. Now prior to 11g, I'd go straight to the nqserver.log file to see why the BI Server won't start, so I close down the log viewer and display the list of logs instead, and again click on the Server Log entry, like this:
and straightaway, I can see what the issue is - I entered the wrong password into Enterprise Manager, and because all EM does when you type the two passwords is is check that they both match each other (not that they are actually correct), the BI Server can't start afterwards as EM is supplying the wrong password from the Credential Store.
So why wasn't this log entry showing up before, when I displayed all the actions relating to the ECID? Well, this particular action doesn't get tagged with an ECID, so whilst it was in the log, it didn't get displayed when I first brought up the log viewer, which is a bit annoying. I've actually found this happens quite a lot for errors around startup and shutdown, which is ironic as the View Log Messages button then end up showing you lots of log activity that doesn't actually get you to the root cause of the error. I'm putting this down to "version 1"-syndrome but it does mean that you often need to be a bit creative when using the log viewer, as sometimes what you're looking for won't come up in normal ECID searches.
So there we are for logging and the Log Viewer in OBIEE 11g. I'll wrap-up this mini-series in a couple of days time with a look at how cache management has changed in this new release.