So if you're administering OBIEE and you're looking out for connectivity issues, what's the sort of thing that can go wrong? From where I'm standing, the major things to look out for include:
- The Presentation Server going down
- The Presentation Server being up, but the BI Server has gone down
- The Scheduler is down or hasn't actually been configured properly
- The J2EE Application Server is down whilst the OBIEE components are up
- The Java Host server is down (and what does this affect?)
- All of the OBIEE components are up including the J2EE server, but the source database is down
- All of the above, but this time Essbase is down
If you're an administrator, what is the impact of these parts of the system failing (we'll ignore clustering for now), and if they are down how can you find out what's wrong?
Starting with the Presentation Server, in theory it shouldn't crash by itself but I've seen this happen when, for example, there is a bug in the Presentation Server code that caused a segfault when a pivot table contained too many cells. In general though, problems with the Presentation Server are generally because the process isn't actually running, and when running with WebLogic as the J2EE Application Server users will get a message like the one below if they try and connect when the Presentation Server service isn't running:
This error is caused by the J2EE Presentation Services plug-in not being able to communicate with the Presentation Services itself. If Presentation Services is running on a Windows box, a quick check in the Windows Services applet will show if the service is running OK, and as we can see in the screenshot below it's actually down.
So why has it gone down? The best places to check are the log file at
- $ORACLEBIDATA/web/log/sawlog0.log
if the Presentation Server was up at one point but then crashed or closed down for some reason, or the Windows Event Viewer (System Log) if the service won't actually start in the first place.
Unfortunately the messages in the System Log don't really tell you what the problem is (the error above was actually caused by a developer making changes in the instanceconfig.xml file but entering the details incorrectly, which stops the Presentation Server service from actually starting) but at least if you try and restart the service and find that it won't actually start, you can probably assume it's due to a change in one of the config files and the next step is to find out who's introduced the changes. In conclusion then, if users can connect to the J2EE application server but not to the Presentation Server, check that the service is up, check the sawlog0.log file to see if there's been an abnormal exit, if there is then work out why, if not then try and restart the service, if it won't then restart then check and see if anyone's mucked around with the instanceconfig.xml and other Presentation Server configuration files.
So what if it's not the Presentation Server down, but instead the J2EE Application Server host is down instead? If this happens, user's won't get a Java error message when they try and connect, instead they'll get the usual "The page cannot be displayed" message (or whatever your OS and browser shows when a website can't be reached).
In this case, you need to check whether your application server is up and what's in the various log files, depending on which application server you're using, as this is effectively out of scope for OBIEE, but if this component isn't working (and thereby providing the bridge to the Presentation Server application) your users won't be able to access any of your dashboards and reports.
Finally on the web front, what happens in the Java Host process is down but everything else is up? Well if this process is down, you should normally be able to log in but you'll notice that your graphs aren't being displayed, as the Java Host process amongst other things talks the third-party Corda process that renders all the graphs.
If you've got OBIEE integrated with EPM Suite and your are trying to log in using credentials stored in Shared Services, you'll find this will fail as well as the Java Host process is involved in communication between the BI Presentation Server and Shared Services.
Again, if it's down, check the Services Applet to see if it's actually running, and if it's not, check the log file at
- $ORACLEBIDATA\web\log\javahost\jhost0.log.0.log
to see if anything is stopping it from starting up (or caused it to crash). If you're using EPM integration and you suspect the problem might be down to Shared Services, you can also check the log at$ORACLEBI\web\javahost\config\hss\logs\registry\registry.logo see if there's anything in there. Incidentally, if you're having problems authenticating against Shared Services but you think you've got all the elements set up correctly, the following two Presentation Services logs will show you whether authentication is failing (indicating that it can't reach Shared Services) and whether Shared Services in in fact reachable from Presentation Services.
- $ORACLEBIDATA\web\log\sawlog0.log
- $ORACLEBIDATA\web\log\SharedServices_Security_Client.log
So what if all the Web elements are up, Shared Services is up but the BI Server is down? If this happens, users will be able to bring up the dashboard login screen but when they try and log in they'll get an error message saying the BI Server is unavailable:
Expanding the error message as I've done above shows that the Presentation Server can't connect to the BI Server via the BI Server ODBC client; again, first check whether the BI Server service is running, if it is but the Presentation Server can't connect to it this may be because of a network issue (assuming they are on different physical boxes), but it's most probably because the BI Server itself is down, either because something's caused it to crash or someone's tried to start it against an invalid or corrupted RPD. Taking a look at the BI Server log file at
- $ORACLEBI/server/log/NQServer.log
we can see that the reason this BI Server is down is because there are no valid subject areas in the repository, probably because they are all invalid or inconsistent.
So why might the BI Server, Presentation Server, Java Host process or indeed the Scheduler or Cluster Controller services be down? In my experience it's usually because someone has fiddled around with the config files or tried to introduce a new RPD version that's got errors in it, although it's not been unknown for any of the server processes to fail or crash when experiencing an abnormal load (particularly when running them on less well used platforms, or when integrating the BI Server with newer technologies such as Essbase, Shared Services and the like). But generally it's because someone has "done something" and as long as you keep daily backups of the config files and repository files you can generally get thing back up and running by restoring back the previous working copy of the relevant file.
Moving on then, what if all the OBIEE components are working but one of the source databases goes down? This one is fairly easy to spot as the error message shows up directly in the dashboard (assuming caching is not switched on and results are being retrieved directly from the cache); in the screenshot below, the report cannot run because the database itself is down:
in the example below, the database is up but the account being used in the connection pool settings doesn't have access to the required tables.
To find out what's happened, firstly check that the database you are using is actually up (a good way of doing this if all you have access to is the Administration tool, is to right-click on one of the database tables and select View Data), and in the case below, we can see that it's because the TNS Listener is down that we can't access the database.
To find out why it's down, use the usual database investigative techniques (check if the listener is up, check if the database instance is up, check the alert log and so on). Also, if you're getting a permissions error, check and see whether the developers have moved to :USER and :PASSWORD based connection pool logins, check and see whether the database account that's being used to log in has the correct permissions, and indeed check to see whether someone's not accidentally dropped the tables in question.
It's a similar story for Essbase. If you try and run a report and Essbase is down, again you'll get an error message in the report, like this:
and again, you'll need to check whether the Essbase server process is running, whether there's a network problem stopping OBIEE reaching it and so on, using the usual tools and log files to diagnose and resolve the issue.
If by the way you're running the OBIEE server components on Linux or Windows, there are usually equivalents to the event viewer, process viewers and so on that you get in Windows, with for example the System Monitor application available under Red Hat Linux / Oracle Enterprise Linux.
When looking to see if the BI Server is up and running you need to look for the "nqserver" process, whilst the Presentation Server runs under the "sawserver" process name (hangovers from the old nQuire and Siebel Analytics days). If you've not got access to the GUI you can use "ps -ef | grep sawserver" from the command line, for example, to check whether a process is up. Once you've found that out checking through the logs is more or less the same as when running under windows.
So there you have it, a few tips on troubleshooting basic OBIEE connectivity issues across the various server components. If I've missed anything or you've got anything else to add, just add a comment to the post.