Oracle BI EE 11g - Skipped & Ragged Hierarchies - Grouping Sets & Grouping IDs
As Mark had blogged about here, a very important new feature of the BI EE 11g is the ability to support ragged & skipped hierarchies in a normal level based hierarchy. Though it does look straight-forward in terms of functionality and also the setup, enabling these 2 options basically changes the underlying SQL Queries generated. In this blog post, we shall see how the skipped & ragged hierarchies work.
To illustrate i will be using a very simple hierarchy as shown below
In this hierarchy, the topmost node A has an unbalanced/ragged node called C.
It has also a skipped node called F as represented below
From a data standpoint this is how they are represented in my database tables.
The above table basically joins to my fact table using the SK_ID. When we enable the Skipped & Ragged options in BI EE 11g and then drill on the member A, you would notice 2 differing behaviors.
They are
1. BI EE has automatically obtained the data for Level 3 member F (with proper indentation to signify that it is at Level 3) though we have just drilled only to Level 2. So, enabling the skipped/ragged options makes BI Server to fire another query for Level 3. In our case it goes to Level 3 but we could have lower level skipped data coming in from any level depending on how the hierarchy is setup. So, BI EE will have to parse all the lower levels below and extract the level that has data.
2. BI EE automatically has disabled the + sign for the ragged member C. So, even in this case it has to parse all the lower levels to know that it is indeed a ragged member.
How does BI EE parse the lower levels? Does it do the usual UNION ALL to get at the lower level data? What if we had 15 levels in our hierarchy? Will BI EE be generating 15 UNION ALL SQL statements even when we are at Level 2?
The answer to all the above questions is; BI EE does all of the above by using a combination of GROUPING SETS and GROUPING_ID sql constructs. Lets look at the actual SQL Query generated by BI EE in the use case that i have showed above.
WITH
SACOMMON18747 AS (select sum(T86163.FACT) as c1,
max(T86158.LEVEL1) as c2,
max(T86158.LEVEL2) as c3,
max(T86158.LEVEL3) as c4,
T86158.LEVEL1 as c5,
T86158.LEVEL2 as c6,
T86158.LEVEL3 as c7,
grouping_id(T86158.LEVEL1,
T86158.LEVEL2, T86158.LEVEL3) as c8
from
SKIPRAGGED T86158,
SKIPRAGGED_FACT T86163
where ( T86158.SK_ID = T86163.SK_ID )
group by grouping sets (
(),
(T86158.LEVEL1),
(T86158.LEVEL2, T86158.LEVEL1),
(T86158.LEVEL3, T86158.LEVEL2, T86158.LEVEL1))),
SAWITH0 AS ((select distinct 0 as c1,
lpad(' ', 1) as c2,
lpad(' ', 1) as c3,
lpad(' ', 1) as c4,
'Dimension Total' as c5,
1 as c6,
lpad(' ', 1) as c7,
lpad(' ', 1) as c8,
lpad(' ', 1) as c9,
D1.c1 as c10,
case when D1.c2 is null and D1.c3 is null
and D1.c4 is null then 1 else 0 end as c11
from
SACOMMON18747 D1
where ( D1.c8 = 7 )
union all
select distinct 1 as c1,
D1.c5 as c2,
lpad(' ', 1) as c3,
lpad(' ', 1) as c4,
'Dimension Total' as c5,
1 as c6,
D1.c5 as c7,
lpad(' ', 1) as c8,
lpad(' ', 1) as c9,
D1.c1 as c10,
case when D1.c3 is null and D1.c4 is null
then 1 else 0 end as c11
from
SACOMMON18747 D1
where ( D1.c8 = 3 and D1.c5 is not null )
union all
select distinct 2 as c1,
D1.c5 as c2,
D1.c6 as c3,
lpad(' ', 1) as c4,
'Dimension Total' as c5,
1 as c6,
D1.c5 as c7,
D1.c6 as c8,
lpad(' ', 1) as c9,
D1.c1 as c10,
case when D1.c4 is null then 1 else 0 end as c11
from
SACOMMON18747 D1
where ( D1.c8 = 1 and D1.c6 is not null and D1.c5 is null )
union all
select distinct 3 as c1,
D1.c5 as c2,
D1.c6 as c3,
D1.c7 as c4,
'Dimension Total' as c5,
1 as c6,
D1.c5 as c7,
D1.c6 as c8,
D1.c7 as c9,
D1.c1 as c10,
case when 1 = 1 then 1 else 0 end as c11
from
SACOMMON18747 D1
where ( D1.c8 = 0 and D1.c7 is not null
and D1.c5 is null and D1.c6 is null ) ))
select D1.c1 as c1,
D1.c2 as c2,
D1.c3 as c3,
D1.c4 as c4,
D1.c5 as c5,
D1.c6 as c6,
D1.c7 as c7,
D1.c8 as c8,
D1.c9 as c9,
D1.c10 as c10,
D1.c11 as c11
from
SAWITH0 D1
order by c1, c5, c2 NULLS FIRST, c3 NULLS FIRST, c4 NULLS FIRST
If you notice, the first part of the sql uses a new kind of GROUP BY statement called as GROUPING SETS. GROUPING SETS were introduced in Oracle Database 9i (i think) where essentially multiple GROUP BY operations that have a dependency of using a common set of columns can be combined into a single query. Lets take a simplified version of this query and fire it in sqlplus.
select sum(T86163.FACT) as c1,
T86158.LEVEL1 as c5,
T86158.LEVEL2 as c6,
T86158.LEVEL3 as c7,
grouping_id(T86158.LEVEL1,
T86158.LEVEL2, T86158.LEVEL3) as c8
from
SKIPRAGGED T86158,
SKIPRAGGED_FACT T86163
where ( T86158.SK_ID = T86163.SK_ID )
group by grouping sets (
(),
(T86158.LEVEL1),
(T86158.LEVEL2, T86158.LEVEL1),
(T86158.LEVEL3, T86158.LEVEL2, T86158.LEVEL1))
The entire SQL above can be converted into individual UNION ALL queries as shown below
select sum(T86163.FACT) as c1,
T86158.LEVEL1 as c5,
T86158.LEVEL2 as c6,
T86158.LEVEL3 as c7,
'000' as Grouping_ID_Binary
from
SKIPRAGGED T86158,
SKIPRAGGED_FACT T86163
where ( T86158.SK_ID = T86163.SK_ID )
group by
(T86158.LEVEL3, T86158.LEVEL2, T86158.LEVEL1)
UNION ALL
select sum(T86163.FACT),
T86158.LEVEL1,
T86158.LEVEL2,
null,
'001'
from
SKIPRAGGED T86158,
SKIPRAGGED_FACT T86163
where ( T86158.SK_ID = T86163.SK_ID )
group by
(T86158.LEVEL2, T86158.LEVEL1)
UNION ALL
select sum(T86163.FACT),
T86158.LEVEL1,
null,
null,
'011'
from
SKIPRAGGED T86158,
SKIPRAGGED_FACT T86163
where ( T86158.SK_ID = T86163.SK_ID )
group by
(T86158.LEVEL1)
UNION ALL
select sum(T86163.FACT),
null,
null,
null,
'111'
from
SKIPRAGGED T86158,
SKIPRAGGED_FACT T86163
where ( T86158.SK_ID = T86163.SK_ID )
Basically the grouping id provides us with the grouping vector (binary representation of whether each column was grouped or not) so that we can differentiate between a null that comes as part of the data (skipped & ragged) and a null that comes as a result of the GROUPING SET operation. In the above query, we have 2 rows of data containing (A null null). The first row
is basically the result of Group BY only on Level 1. But the second row
is basically the result of Level 2 being null (though the Group By was on Level 1 and Level 2) due to our Skip Level hierarchy. BI EE uses the Grouping IDs to properly arrange the values in hierarchical order.
So what this means is, whenever you are enabling the Skip Level and Ragged Level options in BI EE, a group by will be done at every level of the dimension. This is many cases can affect performance of your reports quite a lot. Ensure that this option is not checked on a normal hierarchy. The GROUPING SETs option will work well if we have only one set of fact tables at lowermost grain. But what if we have Aggregate tables and what if we have non-oracle data sources like SQL Server or even excel? I will cover that in a future blog post.
Note: GROUPING SETS are not specific only to Skipped and Ragged Level hierarchies. They can be seen even in a normal hierarchy. But for Skipped and Ragged Level Hierarchies, it will come for sure as the query can become extremely large if normal GROUP BYs are used.