Oracle BI EE 11g – Sparse Lookups & Left Outer Joins

As i had mentioned here, one of the most important features of BI EE 11g is the ability to do ETL kind of Lookup functions directly from Repository. In the Training Days event that we ran in London & Atlanta recently, the most common question that we got was with regard to the usage of Sparse Lookups. The question was, "What is the difference between Sparse Lookups and normal Left Outer Joins that we do in the repository?". In this blog entry we shall see the difference between a normal left outer join and a sparse Lookup. Also, we shall be seeing the advantages of using Sparse Lookups over normal outer joins.

Lets take a very simple example as shown below. In this example we have 3 tables, all providing attributes & keys to a single dimension. To keep it simple, i also have the fact measure located in the same main table.

The Main table has 2 columns as shown below (with 3 rows of data - assume this table to be like a Job Aspirants listing table containing the current salary & other details of the Job Aspirant. Assume Job Aspirant Name & Current Salary are the 2 columns).

This main table provides a KEY column which will be the dimension primary key(Job Aspirant name in the Job Aspirants listing table). The Fact that we shall be using will be the MEASURE column(Job Aspirant salary). Lets assume this Main table joins to 2 other Tables that provide attributes for this dimension. The Data for these 2 tables are shown below

The MAINTABLE joins to the OUTERJOINTABLE through the KEY column(Job Aspirant Table joins to the intermediary current Aspirant-Job matching table). And OUTERJOINTABLE joins to the OUTERJOINATTRIBUTE table through the KEY2 column(Job Aspirant-Job matching table joining to an external Jobs Attributes table to get the details about the job). The MAINTABLE will be a master table and hence can have more KEY values than the other 2 tables. Now the requirement is to ensure that we always display all the KEY values irrespective of what we have in other 2 tables(i.e  show all the Job Aspirants, their current Salary and all their current Job details - Remember not all Job Aspirants will have a Job and hence we need to show the unemployed Aspirants as well). In 10g, we would approach this as shown below

All that we have done is, we have enabled a Left Outer Join between the MAINTABLE & OUTERJOINTABLE. Lets now create a simple report and see the resulting query

WITH
SAWITH0 AS (select sum(T84914.MEASURE) as c1,
     T84914.KEY as c2
from

          SH.MAINTABLE T84914 left outer join
SH.OUTERJOINTABLE T84917 On T84914.KEY = T84917.KEY,
     SH.OUTERJOINTABLEATTRIBUTE T84921
where  ( T84917.KEY2 = T84921.KEY2 )
group by T84914.KEY)
select distinct 0 as c1,
     D1.c2 as c2,
     D1.c1 as c3
from
     SAWITH0 D1

There are 2 major issues with this approach

  1. We have lost the effect of Outer Join due to the second equi Join(we don't see the KEY value 300 which did not have an attribute). There was no way in 10g we could specify/control the order of Joins.

  2. In the query we have an unnecessary join even though we queried only on the KEY & the FACT column. This is a characteristic of doing Left Outer Joins in a LTS (whenever we have an outer join all the underlying tables get included into the query irrespective of the columns chosen in the report)

To negate the issue #1, we would have to use other techniques like pushing the equi join inside a database or a Opaque View so that the Outer Join is performed last. There are other techniques as well but all techniques are generally dependent on the data source & the table joins. Ideally, to resolve this BI EE should provide a way of controlling the order of Joins & also provide an option to supply an incremental filter which will be applied internally in the form of something like ATTRIBUTE(+) = 'A' (not available currently)

To negate issue #2, we would be adding an extra LTS without the Left Outer Join and then add an LTS with the Outer Join. So, whenever we choose the KEY column alone, the first LTS will kick in and the equi-join will not come in the query. But once we bring in the ATTRIBUTE column, we will still have the Issue #1 where we lose the effect of the Outer Join.

To resolve all the above mentioned issues we can use the SPARSE Lookup function in 11g. We start with Modeling a new lookup table in the BMM layer as shown below. This will contain ATTRIBUTE and the KEY columns as shown below. The LTS of the lookup table will contain the OUTERJOIN and OUTERJOINATTRIBUTE tables.

Then in the ATTRIBUTE column of the dimension use the function below

Lookup(SPARSE "11gSparseLookup"."Lookup"."ATTRIBUTE" ,
'Null',
"11gSparseLookup"."Dimension"."KEY"
)

If we now create a report containing the KEY column, we should see all the rows of the main dimension. Same will be the case when we include the ATTRIBUTE column as well

The underlying query will be as shown below. Due to the lookup, we now see 2 separate queries & the outer join will be done in BI Server memory.