Inside the Oracle BI Server Part 3 : BI Server In-Memory Joins
In the previous two postings in this series, I looked at the architecture of the Oracle BI Server, and how it processes incoming queries from Oracle BI Answers. In the latter article I touched on the concept of BI Server in-memory joins, and in this article I want to expand on this topic and look at just what goes on when the BI Server is called upon to combine data from multiple sources.
When the BI Server executes a query plan, it handles the data in four separate stages:
- Firstly, filters and functions are applied to the data from each data source
- Then, the data from these data sources are aggregated as required
- Then they are joined together (or "stitched" together), and
- Then, any calculations and/or aggregations that are applied across data sources are applied
So given this capability, how does it work under the covers? When does the BI Server perform a join in-memory, and when does it get done at the underlying database level? Where can we see what is happening, and can we predict what method the BI Server will use when performing a join? Finally, what algorithm does the BI Server use when performing these joins, and how does it use memory and disk when during the process?
To illustrate how the process works, there are a number of join scenarios that we need to consider. Some relate to joining fact and dimension tables together, and others relate to joining fact tables that share conforming dimensions, or hold conforming data sets of differing granularity.
Joining Fact and Dimension Tables Together
The BI Server semantic layer requires you to organize your business model and mapping layer into a star schema. This star schema may have one or more logical dimension tables, that join to one or more logical fact tables. The logical fact tables typically have conforming dimensions, so that you can create requests that span multiple fact tables and multiple dimension tables.
Taking for the moment joins between fact and dimension tables, depending on how the underlying physical or logical table source joins are set up in the semantic model, these may be either inner joins, left outer joins, right outer joins or full outer joins. The simple example to consider is a business model that is mapped to a single physical database, so that all logical table sources point to the same underlying data source, as shown in the screenshot below:
In this case, if we issued a request against this business model that required data from a dimension table and a fact table, the BI Server would push the join between logical table sources down to the underlying database, a single SQL query would be generated and the execution plan from a level 5 query log entry would look like this:-------------------- Execution plan:The same would apply to a left outer join between table sources in the same database, a right outer join or a full outer join. The BI Server doesn't do any work here except to issue a single SQL query, and you can see just the one "RqList" (request list) in the execution plan, indicating again that the BI Server thinks it only needs to put together one query to satisfy the request.RqList <<2105>> [for database 3023:2820:orcl3,44]
PRODUCTS.PROD_SUBCATEGORY_DESC as c1 GB [for database 3023:2820,44],
sum(SALES.QUANTITY_SOLD by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c2 GB [for database 3023:2820,44]
Child Nodes (RqJoinSpec): <<2136>> [for database 3023:2820:orcl3,44]
PRODUCTS T2874
SALES T2911
DetailFilter: PRODUCTS.PROD_ID = SALES.PROD_ID [for database 0:0]
GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC] [for database 3023:2820,44]
OrderBy: c1 asc [for database 3023:2820,44]
If, however, one of the logical dimension tables had its logical table source re-pointed to a separate physical database, as shown in the screenshot below, the BI Server would now have to do the join itself, as it can't be pushed down to the underlying database (as there are now two of them).
In this case, two SQL queries would be issued, one against each of the two physical databases, and the BI Server would do the join in-memory (or to disk, I'll elaborate on this later on). The corresponding logical execution plan from a level 5 log file would now look like this:-------------------- Execution plan:Notice the "InnerJoin <<2492>> On D901.c2 = D902.c2; actual join vectors: [ 0 ] = [ 1 ]" that is in the middle of the execution plan, between the two main Rqlists - this tells you that the BI Server is doing the join, as it would only appear here if it couldn't be pushed down to the underlying database. You might also find references to LeftOuterJoin, RightOuterJoin and FullOuterJoin here, depending on how the join between the tables is defined in the physical or logical table source joins in your semantic layer.RqBreakFilter <<2465>>[1] [for database 0:0,0]
RqList <<2466>> [for database 0:0,0]
D1.c2 as c1 [for database 3023:2500,44],
sum(D1.c5 by [ D1.c2] at_distinct [ D1.c2, D1.c3] ) as c2 [for database 0:0,0]
Child Nodes (RqJoinSpec): <<2478>> [for database 0:0,0]
(
RqList <<2482>> [for database 0:0,0]
D902.c1 as c2 GB [for database 3023:2500,44],
D901.c2 as c3 [for database 3023:132,44],
D901.c3 as c5 [for database 3023:132,44]
Child Nodes (RqJoinSpec): <<2490>> [for database 0:0,0](
RqList <<2495>> [for database 3023:132:orcl,44]
SALES.PROD_ID as c2 [for database 3023:132,44],
sum(SALES.QUANTITY_SOLD by [ SALES.PROD_ID] ) as c3 [for database 3023:132,44]
Child Nodes (RqJoinSpec): <<2504>> [for database 3023:132:orcl,44]
SALES T211
GroupBy: [ SALES.PROD_ID] [for database 3023:132,44]
OrderBy: c2 asc [for database 3023:132,44]
) as D901
InnerJoin <<2492>> On D901.c2 = D902.c2; actual join vectors: [ 0 ] = [ 1 ](
RqList <<2517>> [for database 3023:2500:orcl2,44]
PRODUCTS.PROD_SUBCATEGORY_DESC as c1 GB [for database 3023:2500,44],
PRODUCTS.PROD_ID as c2 [for database 3023:2500,44]
Child Nodes (RqJoinSpec): <<2523>> [for database 3023:2500:orcl2,44]
PRODUCTS T2502
OrderBy: c2 asc [for database 3023:2500,44]
) as D902
OrderBy: c2, c3 [for database 0:0,0]
) as D1
OrderBy: c1 asc [for database 0:0,0]
Joining Facts with Conforming Dimensions Together
Another situation occurs when you are joining fact tables together that share conforming dimensions. A simple example of this is where you create a request that requires data from two or more fact tables that share conforming dimensions, such as those shown in the screenshot below:
As requests such as these can potentially lead to "fan trap" issues (explained in this blog post), the BI Server knows that it has to generate two logical queries and join, or "stitch" them together to avoid the fan trap. If both fact tables are sourced from the same physical database, and this database supports subquery factoring (the "WITH" clause that you see in Oracle 10gR2/11g SQL statements) then it will generate the following execution plan, which has a FullOuterStitchJoin between the two inner RqList blocks:RqBreakFilter <<3571>>[3] [for database 0:0,0] RqList <<3462>> [for database 3023:2820:orcl3,46] D1.c1 as c1 GB [for database 3023:2820,46], D2.c1 as c2 GB [for database 3023:2820,46], case when D1.c2 is not null then D1.c2 when D2.c2 is not null then D2.c2 end as c3 GB [for database 3023:2820,46] Child Nodes (RqJoinSpec): <<3567>> [for database 3023:2820:orcl3,46] ( RqList <<3474>> [for database 3023:2820:orcl3,46] sum(COSTS.UNIT_COST by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c1 GB [for database 3023:2820,46], PRODUCTS.PROD_SUBCATEGORY_DESC as c2 GB [for database 3023:2820,46] Child Nodes (RqJoinSpec): <<3507>> [for database 3023:2820:orcl3,46] PRODUCTS T2874 COSTS T2830 DetailFilter: COSTS.PROD_ID = PRODUCTS.PROD_ID [for database 0:0] GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC] [for database 3023:2820,46] ) as D1 FullOuterStitchJoin <<3565>> On D1.c2 = D2.c2 ( RqList <<3511>> [for database 3023:2820:orcl3,46] sum(SALES.AMOUNT_SOLD by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c1 GB [for database 3023:2820,46], PRODUCTS.PROD_SUBCATEGORY_DESC as c2 GB [for database 3023:2820,46] Child Nodes (RqJoinSpec): <<3544>> [for database 3023:2820:orcl3,46] PRODUCTS T2874 SALES T2911 DetailFilter: PRODUCTS.PROD_ID = SALES.PROD_ID [for database 0:0] GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC] [for database 3023:2820,46] ) as D2 OrderBy: c3 asc [for database 3023:2820,46]The BI Server Navigator then generates a single SQL statement off of this execution plan, which queries both fact tables using subquery factoring, and then brings the results together in the main body of the statement:
-------------------- Sending query to database named orcl3 (id: <<3462>>): WITH SAWITH0 AS (select sum(T2830.UNIT_COST) as c1, T2874.PROD_SUBCATEGORY_DESC as c2 from PRODUCTS T2874, COSTS T2830 where ( T2830.PROD_ID = T2874.PROD_ID ) group by T2874.PROD_SUBCATEGORY_DESC), SAWITH1 AS (select sum(T2911.AMOUNT_SOLD) as c1, T2874.PROD_SUBCATEGORY_DESC as c2 from PRODUCTS T2874, SALES T2911 where ( T2874.PROD_ID = T2911.PROD_ID ) group by T2874.PROD_SUBCATEGORY_DESC) select distinct SAWITH0.c1 as c1, SAWITH1.c1 as c2, case when SAWITH0.c2 is not null then SAWITH0.c2 when SAWITH1.c2 is not null then SAWITH1.c2 end as c3 from SAWITH0 full outer join SAWITH1 On SAWITH0.c2 = SAWITH1.c2 order by c3If the physical database doesn't support subquery factoring, such as Oracle Database 10gR1 or higher, then the BI Server generates a slightly different execution plan, again with a FullOuterStitchJoin, like this:
-------------------- Execution plan:This is then resolved for this database into two separate SQL statements, which then joined "in-memory" together by the BI Server.RqBreakFilter <<3115>>[3] [for database 0:0,0]
RqList <<3006>> [for database 0:0,0]
D903.c1 as c1 GB [for database 3023:2820,44],
D903.c2 as c2 GB [for database 3023:2820,44],
case when D903.c3 is not null then D903.c3 when D903.c4 is not null then D903.c4 end as c3 GB [for database 3023:2820,44]
Child Nodes (RqJoinSpec): <<3117>> [for database 0:0,0]
(
RqList <<3160>> [for database 0:0,0]
D901.c1 as c1 GB [for database 3023:2820,44],
D902.c1 as c2 GB [for database 3023:2820,44],
D901.c2 as c3 [for database 3023:2820,44],
D902.c2 as c4 [for database 3023:2820,44]
Child Nodes (RqJoinSpec): <<3163>> [for database 0:0,0]( RqList <<3018>> [for database 3023:2820:orcl3,44] sum(COSTS.UNIT_COST by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c1 GB [for database 3023:2820,44], PRODUCTS.PROD_SUBCATEGORY_DESC as c2 GB [for database 3023:2820,44] Child Nodes (RqJoinSpec): <<3051>> [for database 3023:2820:orcl3,44] PRODUCTS T2874 COSTS T2830 DetailFilter: COSTS.PROD_ID = PRODUCTS.PROD_ID [for database 0:0] GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC] [for database 3023:2820,44] OrderBy: c2 asc [for database 3023:2820,44] ) as D901 FullOuterStitchJoin <<3109>> On D901.c2 = D902.c2; actual join vectors: [ 1 ] = [ 1 ] ( RqList <<3055>> [for database 3023:2820:orcl3,44] sum(SALES.AMOUNT_SOLD by [ PRODUCTS.PROD_SUBCATEGORY_DESC] ) as c1 GB [for database 3023:2820,44], PRODUCTS.PROD_SUBCATEGORY_DESC as c2 GB [for database 3023:2820,44] Child Nodes (RqJoinSpec): <<3088>> [for database 3023:2820:orcl3,44] PRODUCTS T2874 SALES T2911 DetailFilter: PRODUCTS.PROD_ID = SALES.PROD_ID [for database 0:0] GroupBy: [ PRODUCTS.PROD_SUBCATEGORY_DESC] [for database 3023:2820,44] OrderBy: c2 asc [for database 3023:2820,44] ) as D902 ) as D903 OrderBy: c3 asc [for database 0:0,0]
-------------------- Sending query to database named orcl3 (id: <<3018>>): select sum(T2830.UNIT_COST) as c1, T2874.PROD_SUBCATEGORY_DESC as c2 from PRODUCTS T2874, COSTS T2830 where ( T2830.PROD_ID = T2874.PROD_ID ) group by T2874.PROD_SUBCATEGORY_DESC order by c2 +++Administrator:2a0000:2a0005:----2010/02/28 15:05:31 -------------------- Sending query to database named orcl3 (id: <<3055>>): select sum(T2911.AMOUNT_SOLD) as c1, T2874.PROD_SUBCATEGORY_DESC as c2 from PRODUCTS T2874, SALES T2911 where ( T2874.PROD_ID = T2911.PROD_ID ) group by T2874.PROD_SUBCATEGORY_DESC order by c2
Joining Table Sources within a Logical Fact
Another situation is a fact table may have more than one logical table source, because individual measures are sourced from different data sources or perhaps measures may be mapped in at differing levels of granularity (this blog post describes such a scenario). In this case, again the BI Server will initially try and push the join down to the underlying database, something that may be possible if a single physical database is used and we can use a technique like subquery factoring; more likely though it will require the BI Server to issue two or more physical SQL statements and then bring the results back together again using a FullOuterStitchJoin.
-------------------- Execution plan: RqList <<7829>> [for database 0:0,0] D1.c1 as c1 [for database 0:0,0], D1.c2 as c2 [for database 0:0,0], D1.c3 as c3 [for database 0:0,0], D1.c4 as c4 [for database 3023:4210,44] Child Nodes (RqJoinSpec): <<7842>> [for database 0:0,0] ( RqList <<7809>> [for database 0:0,0] D1.c1 as c1 [for database 0:0,0], D1.c2 as c2 [for database 0:0,0], D1.c3 as c3 [for database 0:0,0], D1.c4 as c4 [for database 3023:4210,44], D1.c5 as c5 [for database 0:0,0] Child Nodes (RqJoinSpec): <<7824>> [for database 0:0,0] ( RqBreakFilter <<7808>>[1,2,5] [for database 0:0,0] RqList <<7604>> [for database 0:0,0] case when D903.c1 is not null then D903.c1 when D903.c2 is not null then D903.c2 end as c1 GB [for database 0:0,0], case when D903.c3 is not null then D903.c3 when D903.c4 is not null then D903.c4 end as c2 GB [for database 0:0,0], D903.c5 as c3 GB [for database 0:0,0], D903.c6 as c4 GB [for database 3023:4210,44], case when D903.c7 is not null then D903.c7 when D903.c8 is not null then D903.c8 end as c5 GB [for database 0:0,0] Child Nodes (RqJoinSpec): <<7844>> [for database 0:0,0] ( RqList <<7915>> [for database 0:0,0] D901.c1 as c1 [for database 0:0,0], D902.c1 as c2 [for database 3023:4210,44], D902.c2 as c3 [for database 3023:4210,44], D901.c2 as c4 [for database 0:0,0], D901.c3 as c5 GB [for database 0:0,0], D902.c3 as c6 GB [for database 3023:4210,44], D901.c4 as c7 [for database 0:0,0], D902.c4 as c8 [for database 3023:4210,44] Child Nodes (RqJoinSpec): <<7918>> [for database 0:0,0] ( RqList <<7851>> [for database 0:0,0] D1.c2 as c1 [for database 0:0,0], D1.c3 as c2 [for database 0:0,0], D1.c1 as c3 GB [for database 0:0,0], D1.c4 as c4 [for database 0:0,0] Child Nodes (RqJoinSpec): <<7854>> [for database 0:0,0] ( RqBreakFilter <<7687>>[2,3] [for database 0:0,0] RqList <<8040>> [for database 0:0,0] D1.c1 as c1 [for database 0:0,0], D1.c2 as c2 [for database 0:0,0], D1.c3 as c3 [for database 0:0,0], D1.c4 as c4 [for database 0:0,0] Child Nodes (RqJoinSpec): <<8058>> [for database 0:0,0] ( RqList <<7972>> [for database 3023:4483:Quotas,2] sum(QUANTITY_QUOTAS.QUOTA by [ CATEGORY.CATEGORY, MONTHS.MONTH_MON_YYYY] ) as c1 [for database 3023:4483,2], MONTHS.MONTH_MON_YYYY as c2 [for database 3023:4483,2], CATEGORY.CATEGORY as c3 [for database 3023:4483,2], MONTHS.MONTH_YYYYMM as c4 [for database 3023:4483,2] Child Nodes (RqJoinSpec): <<7682>> [for database 3023:4483:Quotas,2] CATEGORY T4486 MONTHS T4488 QUANTITY_QUOTAS T4492 DetailFilter: CATEGORY.CATEGORY = QUANTITY_QUOTAS.CATEGORY and MONTHS.MONTH_YYYYMM = QUANTITY_QUOTAS.MONTH_YYYYMM [for database 0:0] GroupBy: [ CATEGORY.CATEGORY, MONTHS.MONTH_YYYYMM, MONTHS.MONTH_MON_YYYY] [for database 3023:4483,2] ) as D1 OrderBy: c2, c3 [for database 0:0,0] ) as D1 OrderBy: c1 asc, c2 asc [for database 0:0,0] ) as D901 FullOuterStitchJoin <<7800>> On D901.c1 =NullsEqual D902.c1 and D901.c2 =NullsEqual D902.c2; actual join vectors: [ 0 1 ] = [ 0 1 ] ( RqList <<7880>> [for database 3023:4210:orcl4,44] D2.c2 as c1 [for database 3023:4210,44], D2.c3 as c2 [for database 3023:4210,44], D2.c1 as c3 GB [for database 3023:4210,44], D2.c4 as c4 [for database 3023:4210,44] Child Nodes (RqJoinSpec): <<7883>> [for database 3023:4210:orcl4,44] ( RqBreakFilter <<7760>>[2,3] [for database 3023:4210:orcl4,44] RqList <<7989>> [for database 3023:4210:orcl4,44] sum(ITEMS.QUANTITY by [ PRODUCT.CATEGORY, TIMES.MONTH_MON_YYYY] ) as c1 [for database 3023:4210,44], TIMES.MONTH_MON_YYYY as c2 [for database 3023:4210,44], PRODUCT.CATEGORY as c3 [for database 3023:4210,44], TIMES.MONTH_YYYYMM as c4 [for database 3023:4210,44] Child Nodes (RqJoinSpec): <<7755>> [for database 3023:4210:orcl4,44] PRODUCT T4256 TIMES T4264 ITEMS T4239 ORDERS T4248 DetailFilter: ITEMS.ORDID = ORDERS.ORDID and ITEMS.PRODID = PRODUCT.PRODID and ORDERS.ORDERDATE = TIMES.DAY_ID [for database 0:0] GroupBy: [ PRODUCT.CATEGORY, TIMES.MONTH_MON_YYYY, TIMES.MONTH_YYYYMM] [for database 3023:4210,44] ) as D2 OrderBy: c1 asc, c2 asc [for database 3023:4210,44] ) as D902 ) as D903 OrderBy: c1, c2, c5 [for database 0:0,0] ) as D1 OrderBy: c5 asc, c2 asc, c4 asc [for database 0:0,0] ) as D1Again, notice the FullOuterStitchJoin in the execution plan - this indicates that facts (as opposed to facts and dimensions) are being joined together.
This in turn leads to two separate SQL statements. The one against the "orcl" database is more complex because the results then need to be mapped to the aggregation level that the second source, "quotas", comes in at:
-------------------- Sending query to database named Quotas (id: <<7972>>): select sum(T4492."QUOTA") as c1, T4488."MONTH_MON_YYYY" as c2, T4486."CATEGORY" as c3, T4488."MONTH_YYYYMM" as c4 from "CATEGORY" T4486, "MONTHS" T4488, "QUANTITY_QUOTAS" T4492 where ( T4486."CATEGORY" = T4492."CATEGORY" and T4488."MONTH_YYYYMM" = T4492."MONTH_YYYYMM" ) group by T4486."CATEGORY", T4488."MONTH_YYYYMM", T4488."MONTH_MON_YYYY" +++Administrator:2b0000:2b000a:----2010/02/24 17:18:51 -------------------- Sending query to database named orcl4 (id: <<7880>>): select D2.c2 as c1, D2.c3 as c2, D2.c1 as c3, D2.c4 as c4 from (select D1.c1 as c1, D1.c2 as c2, D1.c3 as c3, D1.c4 as c4 from (select sum(T4239.QUANTITY) as c1, T4264.MONTH_MON_YYYY as c2, T4256.CATEGORY as c3, T4264.MONTH_YYYYMM as c4, ROW_NUMBER() OVER (PARTITION BY T4256.CATEGORY, T4264.MONTH_MON_YYYY ORDER BY T4256.CATEGORY ASC, T4264.MONTH_MON_YYYY ASC) as c5 from PRODUCT T4256, TIMES T4264, ITEMS T4239, ORDERS T4248 where ( T4239.ORDID = T4248.ORDID and T4239.PRODID = T4256.PRODID and T4248.ORDERDATE = T4264.DAY_ID ) group by T4256.CATEGORY, T4264.MONTH_MON_YYYY, T4264.MONTH_YYYYMM ) D1 where ( D1.c5 = 1 ) ) D2 order by c1, c2So, to summarize things so far:
- Where possible, the BI Server will try and generate a single SQL statement to resolve a request
- And if possible, any joins that are required between tables will be pushed down to the database
- If table data sources are located on separate physical databases, the BI Server will request the individual data source data blocks, and then join the results together in-memory using an inner, left outer, right outer or full outer join as appropriate
- If facts (or measures within a fact) are being joined together, the BI Server will need to generate one logical query per logical table source, and bring the data together with a full outer stitch join
- As mentioned above, if it's possible to do this stitch join at the database level (using, for example, a WITH clause), it'll do so
- Otherwise the BI Server will generate separate SQL statements and join the data together in-memory
Fragmented Data Sources
Another variation on a join that the BI Server can do is a "union" between two queries. This is most common when you have fragmented data sources, such as the example below where part of the data in the sales table comes from one table, and part from another.
-------------------- Execution plan: RqList <<7569>> [for database 3023:6594:orcl7,44] D3.c2 as c1 GB [for database 3023:6594,44], sum(D3.c3 by [ D3.c2] ) as c2 GB [for database 3023:6594,44] Child Nodes (RqJoinSpec): <<7695>> [for database 3023:6594:orcl7,44] ( RqList <<7613>> [for database 3023:6594:orcl7,44] PRODUCTS.PROD_SUBCATEGORY_DESC as c2 [for database 3023:6594,44], SALES_UPTO_2003.AMOUNT_SOLD as c3 [for database 3023:6594,44] Child Nodes (RqJoinSpec): <<7617>> [for database 3023:6594:orcl7,44] PRODUCTS T6596 SALES T6629 DetailFilter: PRODUCTS.PROD_ID = SALES_UPTO_2003.PROD_ID [for database 0:0] RqUnion All <<7690>> [for database 3023:6594:orcl7,44] RqList <<7668>> [for database 3023:6594:orcl7,44] PRODUCTS.PROD_SUBCATEGORY_DESC as c2 [for database 3023:6594,44], SALES_BEYOND_2003.AMOUNT_SOLD as c3 [for database 3023:6594,44] Child Nodes (RqJoinSpec): <<7672>> [for database 3023:6594:orcl7,44] PRODUCTS T6596 SALES T6637 DetailFilter: PRODUCTS.PROD_ID = SALES_BEYOND_2003.PROD_ID [for database 0:0] ) as D3 GroupBy: [ D3.c2] [for database 3023:6594,44] OrderBy: c1 asc [for database 3023:6594,44]Then, depending on whether the BI Server can resolve this using a single query or multiple queries against separate data source, either a single SQL statement like the one below will be issued, or separate statements will be issued and the BI Server will do the union all in memory.
select D3.c2 as c1, sum(D3.c3) as c2 from ((select T6596.PROD_SUBCATEGORY_DESC as c2, T6629.AMOUNT_SOLD as c3 from PRODUCTS T6596, SALES T6629 /* SALES_UPTO_2003 */ where ( T6596.PROD_ID = T6629.PROD_ID ) union all select T6596.PROD_SUBCATEGORY_DESC as c2, T6637.AMOUNT_SOLD as c3 from PRODUCTS T6596, SALES T6637 /* SALES_BEYOND_2003 */ where ( T6596.PROD_ID = T6637.PROD_ID ) ) ) D3 group by D3.c2 order by c1Driving Tables (Parameterized Nested Loop Joins)
I mentioned in the paragraph above that BI Server joins are typically done using the sort-merge algorithm. One variation on this though is when you set one of the two tables in a business model and mapping logical join to be a driving table, typically because you are federating fact and dimension tables and one table is much smaller than the other, as shown in the screenshot below.
The first thing to understand with driving tables is that they are regarded as a "hint" by the BI Server, and the BI Server may well choose to ignore the setting if it makes more sense to perform the join as normal (presumably, when both tables are relatively small). If the driving table instruction is followed, though, the BI Server will always do the join in-memory, even if both tables come from logical table sources pointing to the same physical database. In the execution plan shown below, you can see the InnerJoin (left drive) that indicates a parameterized nested loop join (PNLJ) will be required, and as the name suggests the BI Server will perform a nested loop join rather than the sort-merge join that it usually uses to join tables together.-------------------- Execution plan: RqBreakFilter <<8705>>[1] [for database 0:0,0] RqList <<8972>> [for database 0:0,0] D1.c2 as c1 [for database 3023:2500,44], sum(D1.c5 by [ D1.c2] at_distinct [ D1.c2, D1.c3] ) as c2 [for database 0:0,0] Child Nodes (RqJoinSpec): <<8984>> [for database 0:0,0] ( RqList <<8463>> [for database 0:0,0] D901.c1 as c2 GB [for database 3023:2500,44], D902.c2 as c3 [for database 3023:5035,44], D902.c3 as c5 [for database 3023:5035,44] Child Nodes (RqJoinSpec): <<8707>> [for database 0:0,0]Then then leads to the following parameterized SQL statements being issued, with the first statement representing the "driving" query, and the second the "probing" one against the larger table.( RqList <<8757>> [for database 3023:2500:orcl2,44] PRODUCTS.PROD_NAME as c1 GB [for database 3023:2500,44], PRODUCTS.PROD_ID as c2 [for database 3023:2500,44] Child Nodes (RqJoinSpec): <<8760>> [for database 3023:2500:orcl2,44] PRODUCTS T2502 DetailFilter: PRODUCTS.PROD_NAME = '128MB Memory Card' or PRODUCTS.PROD_NAME = '3 1/2" Bulk diskettes, Box of 100' or PRODUCTS.PROD_NAME = '5MP Telephoto Digital Camera' or PRODUCTS.PROD_NAME = '64MB Memory Card' or PRODUCTS.PROD_NAME = 'Deluxe Mouse' or PRODUCTS.PROD_NAME = 'Envoy Ambassador' or PRODUCTS.PROD_NAME = 'Envoy External 8X CD-ROM' or PRODUCTS.PROD_NAME = 'Martial Arts Champions' or PRODUCTS.PROD_NAME = 'Model A3827H Black Image Cartridge' or PRODUCTS.PROD_NAME = 'Model C93822D Wireless Phone Battery' or PRODUCTS.PROD_NAME = 'Model CD13272 Tricolor Ink Cartridge' or PRODUCTS.PROD_NAME = 'PCMCIA modem/fax 28800 baud' or PRODUCTS.PROD_NAME = 'SIMM- 16MB PCMCIAII card' or PRODUCTS.PROD_NAME = 'Smash up Boxing' or PRODUCTS.PROD_NAME = 'Unix/Windows 1-user pack' [for database 0:0] OrderBy: c2 asc [for database 3023:2500,44] ) as D901 InnerJoin (left drive) <<8806>> On D901.c2 = D902.c2; actual join vectors: [ 1 ] = [ 0 ] ( RqList <<8790>> [for database 3023:5035:orcl5,44] SALES.PROD_ID as c2 [for database 3023:5035,44], sum(SALES.AMOUNT_SOLD by [ SALES.PROD_ID] ) as c3 [for database 3023:5035,44] Child Nodes (RqJoinSpec): <<8793>> [for database 3023:5035:orcl5,44] SALES T5126 DetailFilter: SALES.PROD_ID = ?1 or SALES.PROD_ID = ?2 or SALES.PROD_ID = ?3 or SALES.PROD_ID = ?4 or SALES.PROD_ID = ?5 or SALES.PROD_ID = ?6 or SALES.PROD_ID = ?7 or SALES.PROD_ID = ?8 or SALES.PROD_ID = ?9 or SALES.PROD_ID = ?10 or SALES.PROD_ID = ?11 or SALES.PROD_ID = ?12 or SALES.PROD_ID = ?13 or SALES.PROD_ID = ?14 or SALES.PROD_ID = ?15 or SALES.PROD_ID = ?16 or SALES.PROD_ID = ?17 or SALES.PROD_ID = ?18 or SALES.PROD_ID = ?19 or SALES.PROD_ID = ?20 [for database 0:0] GroupBy: [ SALES.PROD_ID] [for database 3023:5035,44] OrderBy: c2 asc [for database 3023:5035,44] ) as D902 OrderBy: c2, c3 [for database 0:0,0] ) as D1 OrderBy: c1 asc [for database 0:0,0]
-------------------- Sending query to database named orcl2 (id: <<8757>>): select T2502.PROD_NAME as c1, T2502.PROD_ID as c2 from PRODUCTS T2502 where ( T2502.PROD_NAME in ('128MB Memory Card', '3 1/2" Bulk diskettes, Box of 100', '5MP Telephoto Digital Camera', '64MB Memory Card', 'Deluxe Mouse', 'Envoy Ambassador', 'Envoy External 8X CD-ROM', 'Martial Arts Champions', 'Model A3827H Black Image Cartridge', 'Model C93822D Wireless Phone Battery', 'Model CD13272 Tricolor Ink Cartridge', 'PCMCIA modem/fax 28800 baud', 'SIMM- 16MB PCMCIAII card', 'Smash up Boxing', 'Unix/Windows 1-user pack') ) order by c2 +++Administrator:2c0000:2c000a:----2010/02/24 21:06:47 -------------------- Sending query to database named orcl5 (id: <<8790>>): select T5126.PROD_ID as c2, sum(T5126.AMOUNT_SOLD) as c3 from SALES T5126 where ( T5126.PROD_ID in (:PARAM1, :PARAM2, :PARAM3, :PARAM4, :PARAM5, :PARAM6, :PARAM7, :PARAM8, :PARAM9, :PARAM10, :PARAM11, :PARAM12, :PARAM13, :PARAM14, :PARAM15, :PARAM16, :PARAM17, :PARAM18, :PARAM19, :PARAM20) ) group by T5126.PROD_ID order by c2In reality you rarely see driving table joins being used as there are much better solutions to bringing together small and large tables together - the main one being to co-locate the tables and then push the join down to the database, rather than bring both datasets together and have the BI Server join them in memory instead (this also applies to a lesser degree to all BI Server joins). But this could be a useful "quick fix" until such time as you can co-locate the data, and its useful to remember that these types of joins are always done by the BI Server due to the need to iterate through drive/probe operations.
Persist Connnection Pools
One final variation on BI Server execution plans and join types is when you set up a "persist connection pool". Persist connection pools are typically used in two scenarios; firstly, where Oracle/Siebel Marketing is being used, and secondly, where the underlying physical database doesn't handle large numbers of values in an IN-list. In this case, you can set up a second connection pool within a physical database and specify it as the persist connection pool, as shown in the screenshot below:
I've never encountered a persist connection pool "in the wild", so to speak, but an example query log output from when one was used is shown below. In this instance, the first query was sent to a MS Analysis Services database, and a persist connection pool was used to materialize the in-list results into a database table which is then joined back to the ORDERS table in the final query, rather than have the BI Server do the join in-memory.-------------------- Sending query to database named FoodMart (id: <<10980>>): With member [Measures].[YearAnc] as 'ancestor([Time].Currentmember,[Time].[Year]).name' set [Q] as '{{[Time].[Year].members}}' select {[measures].[YearAnc]} on columns, {[Q]} on rows from [Sales]-------------------- Sending query to database named SQLDB_Northwind (id: CreateTable TransGateway):
CREATE TABLE TTCH5C5DEL554110000020000003 ( column1 VARCHAR2(8) )-------------------- Sending query to database named SQLDB_Northwind (id: <<11057>>):
select distinct TO_NUMBER(TO_CHAR(T1864.OrderDate, 'yyyy'), '9999') as c1
from
Orders T1864
where ( TO_NUMBER(TO_CHAR(T1864.OrderDate, 'yyyy'), '9999') in (select column1 from TTCH5C5DEL554110000020000003) )
Conclusions
So, there you have it. The join strategy of the BI Server, as is the case with functions and calculations, is to wherever possible push them down to the underlying database. If this can't be done, because either the database version doesn't support features like subquery factoring, or if the data for the request is being sourced from more than one physical databas, the BI Server will do the join itself, initially in-memory but usually with temporary data being paged to disk.
There are two main types of BI Server join; regular (inner, left outer, right outer and fullouter) joins for bringing together fact and dimension tables; and full outer stitch joins, for bringing together facts and measures. There are also variations for handling joins from very small tables to very large tables (driving tables, or parameterized nested loop joins), or when the physical database doesn't support large in-lists, however these issues are usually better handled by co-locating data or upgrading the database.
Finally, even though the BI Server is pretty clever at doing these types of joins, you're usually better trying to invest your time in physically bringing your data together into a data mart or data warehouse than spending too much time fine-tuning these joins, though a knowledge of how they work (and how to read a level 5 execution plan) can be useful if you have to understand, or tune, an existing system in-place. Of course the level 5 execution plan doesn't really tell you anything you couldn't determine by looking at the design of the RPD - there's nothing that goes on beyond this that might change the execution plan for a certain set of data, unlike the Oracle database which changes the plan from database to database depending on the distribution and nature of the data - but its interesting to get a peek into the workings of the BI Server Navigator module.