Using OBIEE for Reporting Directly Against Operational Systems

[This is a guest blog from Steve Devine from Hitachi Consulting, who we've known for a while and who attended our BI Forum in Brighton last May. Steve, Adrian, Venkat and I were discussing the issue of reporting against non-star schema data models and in particular the issues around reporting against HR data, which seems to bring out all the issues around reporting against OLTP data sets. Steve kindly agreed to write up his question as a blog post, and would welcome feedback on the questions he poses. Does OBIEE represent a good solution for non-star schema reporting, and how else has anyone reported against HR data? - MR]

"Since Oracle's strategic direction for all ad-hoc reporting is OBIEE, I wanted to understand how feasible this is and ensure that our customers migrating from Discoverer would not also have to invest in building data marts or warehouses. Whilst researching this I found few blog resources on the subject suggesting that there is very little real world experience of implementing OBIEE against an OLTP schema. The OBIEE documentation does not include any case studies either.

So the purpose of this blog is to get some feedback in order to understand what OLTP modelling has been undertaken to date and to start to build a library of 3NF model patterns and their OBIEE implementations, something which I am more than happy to be the custodian of.

The broader question of how suitable OBIEE is for reporting against Operational Systems came out of a project that I've recently been working on which involved the migration of one of our Discoverer products to OBIEE. The real challenge has been how to create a Business Model against a 3NF OLTP database based on Oracle HR. Having since experimented with modelling more straight forward domains such as an ERP Order -> Despatch -> Invoice process it is clear to me that HR data has some characteristics that are not found in other domains, and are not easily modelled in OBIEE. HR data is unique for the following reasons:

  • Full history is tracked in the operational system. For example, Oracle HR maintains a historical record of an Employee's addresses and their salary which they call Date Tracking. In BI terms every attribute is SCD Type 2
  • It is predominantly verbose and textual in nature. There are no measures.
This blog presents two alternative Business Models along with the pros and cons of each.

Here is a cut-down version of the physical model (the complete model contains seven more detail tables containing other Employee information such as Salary Payment Method, NI Class and Benefits etc).

Image001

Employee holds general information such as an individual's name, demographics and their job details. The others speak for themselves! Each table records history and has a pair of date attributes which record the period that the information was valid for. Oracle HR enforces integrity of these periods so that for an individual Employee the periods are contiguous and do not overlap. For an Employee, the periods are different in each table though - an Employee is likely to change Jobs and Salary more frequently than they change addresses for example – so the join between the Employee and detail tables is just the Employee Id.

A typical requirement would be to report current Employee details such as their job, address and salary for all Employees who currently live in London. We'll use this requirement to test each of the solution options later.

Against the physical schema the query would look like:

select employee.job, salary.salary, address.address
from employee,
salary,
address
where employee.employee_id = salary.employee_Id
and emplyoee.employee_id = address.employee_id
and CURRENT_DATE between employee.start_date and employee.end_date
and CURRENT_DATE between salary.start_date and salary.end_date
and CURRENT_DATE between address.start_date and address.end_date
and address.city = 'London'
You can also get this information at any point in time by changing CURRENT_DATE to the date you want to report for.

As well as allowing the user to write a Request across multiple physical tables, we gave ourselves the objective of creating a Business Model so that they would only have to enter a single condition (filter) for the date they wish to report for (as the above query shows in SQL or Discoverer the user would have to enter one per table. In the actual application there are around ten detail tables and the Employee table has another three pairs of dates so this can get cumbersome!)

Option 1

The Business Model was created as follows

Image002

1) In order to allow the user to create a single filter in Answers to get data from all three tables at a specified point in time I created a couple of conformed dimensions: 'Start Date' and 'End Date'. The attributes have sources for each physical table. This also enables you to conform the physical date track attribute names as they are different in each of the detail tables e.g. Salary.Period Start/End Date, Employee.Effective Start/End Date, Address.Change Date/Change End Date and they have been conformed to Start/End Date.

Image003

2) In order that I could join the Employee table to these new dimensions and ensure that any filters defined using these dimensions were applied to the Employee data I created a dummy Employee fact. This fact contained all of the attributes from the physical Employee Table.
  1. Employee contains a single attribute – the Employee Id which is the Employee PK.

  2. Salary and Address contained the same attributes as their physical sources.

  3. For each fact it is important to set the 'Logical Level' for each dimension in the source content, otherwise you may get Consistency warnings and certain queries will error.

Image004

So can this model be used to fulfil our test requirement?. Initially, the problem I had was that the moment you include non measure fact attributes (any of the textual verbose attributes) in a multi fact Request OBIEE returns NULLs for facts 2..N as it thinks you are performing an invalid query. Thinking about this from a dimensional modelling perspective I believe that OBIEE has this restriction because potentially you could be mixing different fact table grains. Obviously this is not always the case, if OBIEE allowed you to model the fact attributes and whether they are functionally dependant on conformed dimension PKs then it could work out whether the query was valid or not i.e. a valid query across two facts is one that includes any of the following:

Conformed Dimensions + Verbose fact attributes that are functionally dependant on a Conformed Dimension PK + Fact measures

For example, if you have a dimension whose PK is Order Id, Item Id, an Orders Fact with an Order Qty measure, a Despatches Fact with Despatch Status and a Despatch Qty measure and Despatch Status is functionally dependant on the Order PK i.e. One Order Item can have only one Despatch Status then it is perfectly valid to create a query with all of these attributes. OBIEE however will not query the Orders fact if Despatch Status is included in the Request.

I guess this all goes back to the point that OBIEE does not like you to include non measure attributes as facts. Will this change in 11g I wonder?

So the workaround I implemented is to trick OBIEE into thinking these verbose attributes are measures by applying the 'min' aggregate function (it could be any aggregate function that works with text). As long as you include the Employee PK (it is important that this is defined in the Employee dimension as described in 3 above) in each Request the results are correct because you are effectively aggregating only one record per Employee. OBIEE generates individual queries for each fact included in the query and uses an in memory stitch join to generate the final result set which is obviously not the most efficient method. Whilst this workaround produces the correct results, when you start to include these attributes in filters the SQL generated resolves the filter using a having clause which again is not the most efficient method i.e. if these attributes did not have an aggregate function defined against them a where clause would be the most efficient method.

Actual SQL:

select dim.Employee PK, min(Address) from Employee dim, Address fact
where dim.Employee PK = fact.Employee PK
group by Employee PK
having min(City) = 'London'

Preferred Performant SQL:


select dim.Employee PK, min(Address) from Employee dim, Address fact
where dim.Employee PK = fact.Employee PK
and City = 'London'
group by Employee PK

OBIEE best practice is clear: Facts should only consist of measures and textual attributes should be modelled as dimensions. Therefore the obvious solution was to create dummy unconformed dimensions for Address and Salary. These would contain all the verbose attributes from the physical sources. This works fine up to a point - if you include the attribute in a filter the SQL generated is of the preferred performant kind. The problem I am left with is that this only works for single fact Requests. Again, if you think about it this makes perfect sense because the dimension is unconformed.

There is a further issue when you try and include the 'Start Date' or 'End Date' in the Request Criteria (rather than just as a filter). For example if you wanted to report the full salary history for each Employee. The Request Criteria is shown below:

Image005

The results are rather surprising as it returns the UNION of all Start/End Date pairs from the Employee and Salary data as shown below.

Image006-1

Records that show the Employee Name populated correspond to the Employee Start and End Dates and those with the Salary populated correspond to the Salary Start and End Dates (Oracle HR uses an End Date of 31/12/4712 for the current details). This unexpected capability is extremely valuable as it allows the user to create a complete view of an Employee's history across multiple datasets.

What about if you only want to see the Salary periods? The only way to get this to work is to model the Employee Name in the Employee dimension instead of the dummy fact. Unfortunately this approach will cause the test requirement to generate invalid results as the Start/End Date filter will not be applied to the Employee data as we would not be including any attributes from the Employee dummy fact. It seems undesirable to include the Employee Name in both the dimension and dummy fact as this will be confusing to the user.

One more point about conformed dimensions. If you just create a Request with Start Date it will only return data from the first source defined for the dimension in the Business Model. This is a problem as ideally the results should contain the distinct set of values from all sources. I'm also currently having problems creating a multi level dimension hierarchy, but that is a WIP!

Option 2

This option is a useful reference as it demonstrates some OBIEE tricks and things you will certainly not find documented. I have Venkat to thank for outlining this solution (for those of you that don't know his blog it is always a good place to start when trying to understand anything OBIEE http://oraclebizint.wordpress.com and now http://www.rittmanmead.com/blog).

The Business Model was created as follows:

  1. The Employee, Salary and Address physical tables were denormalised to create a single logical dimension called Employee. This was achieved by dragging the physical Employee table across to the Business Model and then mapping this source to the Salary and Address detail tables.

Image007

I then dragged the Salary and Address columns over from the physical tables. In order to differentiate the Start and End Dates I renamed them to indicate which dataset they related to.
  1. Because each Logical table must join to at least one other table, I created a dummy fact which contained a single measure that counted the Employee PK. I don't understand why OBIEE does not allow standalone tables in the Business Model, there are many reasons why you would want to present a standalone table to the user and having to create these dummy tables just clutters up the model.

  2. Now the really clever bit. As you can see from point 1), the source is called 'Employee – Current'. That is because there are actually two sources defined for the Employee physical table, the other being called 'Employee – History'. You create this second source by simply copying and pasting the initial source. The difference between these sources is that 'Employee – Current' has a filter defined against it to only return the current information i.e. the filter includes each CURRENT_DATE predicate from the test requirement query in the opening section, whereas 'Employee – History' has no filter.

Two logical attributes are then created:

  • 'Current Indicator' which is only mapped to the 'Employee – Current' source.

Image008

  • 'History Indicator' which is only mapped to the 'Employee – History' source.

Image009

This is a really elegant solution for reporting the current information. When the user includes the 'Current Indicator' in a Request, because it is only mapped to the 'Employee – Current' source OBIEE then applies the filter defined against this source and only the current information is returned. To report full history they simply include the 'History Indicator'.

And what about our test requirement? The main benefit this solution has over the first is that because the verbose attributes are modelled as dimensions predicates are applied as where clauses rather than having clauses and are therefore more performant.

Also a single query is used to return the results which is again more efficient. This is as a result of mapping the sources as described in 1) above. This metadata definition ensures that OBIEE will query the source tables together using a single query rather than Option 1 which issues separate queries for each source.

The main disadvantage with this solution is that to report details at any point in time, the user must define a filter for each dataset they wish to include. One workaround for this would be to set a session variable from a Dashboard Prompt that is referenced in the filter for the 'Employee – History' source, but this would not be possible from a Request. However, having separate pairs of dates does allow the user to perform more flexible analysis as it gives them the capability to query the Employee data at different points in time. For example if they were trying to see if there was a correlation between people that attended a specific course in Q1 and whether the course contributed to them receiving an end of year pay rise.

I would of course be interested to hear from anyone that has other solutions or answers to any of the outstanding limitations and problems I've outlined in this blog."

[Steve can be contacted at [email protected], and you can leave a comment on this blog post - MR]