So How Do Star Transformations Actually Work?
Although like most Oracle data warehousing developers, I'm aware of what a star transformation is, and I can usually spot one in an execution plan. Earlier in December though I was on a customer engagement and one of the client's developers asked some deceptively simple questions about star transformations, which on the face of it I should have known but in reality made me think a bit. The developer had a web and transactional background, knew about join types such as nested loops and hash joins, but was wondering:
- Firstly, if with nested loop joins, Oracle starts at the smallest dimension table, joins it to the fact table, and then progressively joins the results of this to the other dimension tables, why is a star transformation, that joins all dimension tables to the fact table at once, more efficient?
- Secondly, what exactly goes on during a star transformation?;
- Thirdly, how do you interpret a star transformation execution plan?,
- And fourthly, when did Oracle star adding messages to execution plans to tell you a star transformation took place?
To start off, I create a copy of the SH.CUSTOMERS and SH.PRODUCTS tables, and two copies of the SH.SALES table. I then add primary key constraints to the two dimension tables. foreign key constraints to the fact tables, and then create bitmap indexes on one of the fact tables followed by b-tree indexes on the other fact table. Finally, I gather statistics on the tables and indexes so that I'm then ready to start the tests. The version of Oracle is 11.1.0.6 on Windows XP SP2.
SQL> set echo on SQL> set autotrace traceonly SQL> set pages 400 SQL> set lines 500 SQL> -- first of all, create copies of the customers, products and sales tables SQL> -- with indexes and constraints applied as required SQL> drop table sales_star; Table dropped. SQL> drop table customers_star; Table dropped. SQL> drop table products_star; Table dropped. SQL> -- first, the tables SQL> create table sales_star 2 as 3 select * from sh.sales; Table created. SQL> create table customers_star 2 as 3 select * from sh.customers; Table created. SQL> create table products_star 2 as 3 select * from sh.products; Table created. SQL> -- then the required constraints SQL> alter table customers_star add constraint cust_star_pk primary key (cust_id); Table altered. SQL> alter table products_star add constraint prod_star_pk primary key (prod_id); Table altered. SQL> alter table sales_star add constraint sales_prod_star_fk foreign key (prod_id) references products_star; Table altered. SQL> alter table sales_star add constraint sales_cust_star_fk foreign key (cust_id) references customers_star; Table altered. SQL> -- then the bitmap indexes on the fact table foreign key columns SQL> create bitmap index sales_star_cust_bix on sales_star(cust_id); Index created. SQL> create bitmap index sales_star_prod_bix on sales_star(prod_id); Index created. SQL> -- then the bitmap indexes on the dimension attributes SQL> create bitmap index customers_star_gender_bix on customers_star(cust_gender); Index created. SQL> create bitmap index customers_star_city_bix on customers_star(cust_city); Index created. SQL> create bitmap index products_star_subcategory_bix on products_star(prod_subcategory_desc); Index created. SQL> -- finally, gather table and index statistics SQL> analyze table sales_star compute statistics for table for all indexes for all indexed columns; Table analyzed. SQL> analyze table customers_star compute statistics for table for all indexes for all indexed columns; Table analyzed. SQL> analyze table products_star compute statistics for table for all indexes for all indexed columns; Table analyzed.Ok, so we're ready to roll. If I first run an ALTER SESSION command to turn off star transformations, and I then run a typical star query against the fact table with regular b-tree indexes on the dimension key columns, my execution plan looks like this:
SQL> alter session set star_transformation_enabled='false'; Session altered. SQL> -- run a sample query, with star transformations disabled for the b-tree indexed fact table SQL> select sum(quantity_sold), p.prod_subcategory_desc, c.cust_gender 2 from sales_no_star s, products_star p, customers_star c 3 where s.prod_id = p.prod_id 4 and s.cust_id = c.cust_id 5 and p.prod_subcategory_desc = 'Memory' 6 and c.cust_city = 'Oxford' 7 and c.cust_gender = 'F' 8 group by p.prod_subcategory_desc, c.cust_gender; Execution Plan ---------------------------------------------------------- Plan hash value: 3223803986 ------------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 50 | 282 (1)| 00:00:04 | | 1 | SORT GROUP BY NOSORT | | 1 | 50 | 282 (1)| 00:00:04 | |* 2 | HASH JOIN | | 104 | 5200 | 282 (1)| 00:00:04 | | 3 | TABLE ACCESS BY INDEX ROWID | CUSTOMERS_STAR | 29 | 406 | 9 (0)| 00:00:01 | | 4 | BITMAP CONVERSION TO ROWIDS | | | | | | | 5 | BITMAP AND | | | | | | |* 6 | BITMAP INDEX SINGLE VALUE | CUSTOMERS_STAR_CITY_BIX | | | | | |* 7 | BITMAP INDEX SINGLE VALUE | CUSTOMERS_STAR_GENDER_BIX | | | | | | 8 | NESTED LOOPS | | | | | | | 9 | NESTED LOOPS | | 25523 | 897K| 272 (0)| 00:00:04 | | 10 | TABLE ACCESS BY INDEX ROWID | PRODUCTS_STAR | 2 | 32 | 2 (0)| 00:00:01 | | 11 | BITMAP CONVERSION TO ROWIDS| | | | | | |* 12 | BITMAP INDEX SINGLE VALUE | PRODUCTS_STAR_SUBCATEGORY_BIX | | | | | |* 13 | INDEX RANGE SCAN | SALES_NO_STAR_PROD_BIX | 12762 | | 26 (0)| 00:00:01 | | 14 | TABLE ACCESS BY INDEX ROWID | SALES_NO_STAR | 12762 | 249K| 135 (0)| 00:00:02 | ------------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("S"."CUST_ID"="C"."CUST_ID") 6 - access("C"."CUST_CITY"='Oxford') 7 - access("C"."CUST_GENDER"='F') 12 - access("P"."PROD_SUBCATEGORY_DESC"='Memory') 13 - access("S"."PROD_ID"="P"."PROD_ID") Statistics ---------------------------------------------------------- 139 recursive calls 0 db block gets 511 consistent gets 0 physical reads 0 redo size 569 bytes sent via SQL*Net to client 416 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processedSo if we read through the execution plan, starting at the right-most, top-most line (line 10), we can see the following steps happening:
- The PRODUCTS_STAR table is accessed via a bitmap index on the subcategory column, with two rows being estimated as being returned (steps 10,11 and 12)
- The filtered rows from the PRODUCTS_STAR table are then nested loop joined to the SALES_NO_STAR table, with the PRODUCTS_STAR table driving (steps 8,9, 13 and 14)
- Around 25,000 rows are returned from this join and are then hash joined to the CUSTOMERS_STAR table (steps 2 to 7), returning around 211 rows
- This data is then aggregated to give us the final results.
A variation on this is when we still have star transformations disabled, but we change the b-tree indexes on the fact table to bitmap indexes, in preparation for running a full star transformation. In this case, the execution plan looks a little different, it's still not a star transformation but what it looks like is something not unlike an old-fashioned star join, where the two dimension tables are cartesian joined together before the results are then nested loop joined to the fact table.
SQL> -- now run it for the bitmap indexed fact table SQL> select sum(quantity_sold), p.prod_subcategory_desc, c.cust_gender 2 from sales_star s, products_star p, customers_star c 3 where s.prod_id = p.prod_id 4 and s.cust_id = c.cust_id 5 and p.prod_subcategory_desc = 'Memory' 6 and c.cust_city = 'Oxford' 7 and c.cust_gender = 'F' 8 group by p.prod_subcategory_desc, c.cust_gender; Execution Plan ---------------------------------------------------------- Plan hash value: 3116684012 ------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 50 | 205 (0)| 00:00:03 | | 1 | SORT GROUP BY NOSORT | | 1 | 50 | 205 (0)| 00:00:03 | | 2 | NESTED LOOPS | | | | | | | 3 | NESTED LOOPS | | 104 | 5200 | 205 (0)| 00:00:03 | | 4 | MERGE JOIN CARTESIAN | | 58 | 1740 | 23 (5)| 00:00:01 | | 5 | TABLE ACCESS BY INDEX ROWID | PRODUCTS_STAR | 2 | 32 | 2 (0)| 00:00:01 | | 6 | BITMAP CONVERSION TO ROWIDS | | | | | | |* 7 | BITMAP INDEX SINGLE VALUE | PRODUCTS_STAR_SUBCATEGORY_BIX | | | | | | 8 | BUFFER SORT | | 29 | 406 | 21 (0)| 00:00:01 | | 9 | TABLE ACCESS BY INDEX ROWID | CUSTOMERS_STAR | 29 | 406 | 23 (5)| 00:00:01 | | 10 | BITMAP CONVERSION TO ROWIDS| | | | | | | 11 | BITMAP AND | | | | | | |* 12 | BITMAP INDEX SINGLE VALUE| CUSTOMERS_STAR_CITY_BIX | | | | | |* 13 | BITMAP INDEX SINGLE VALUE| CUSTOMERS_STAR_GENDER_BIX | | | | | | 14 | BITMAP CONVERSION TO ROWIDS | | | | | | | 15 | BITMAP AND | | | | | | |* 16 | BITMAP INDEX SINGLE VALUE | SALES_STAR_CUST_BIX | | | | | |* 17 | BITMAP INDEX SINGLE VALUE | SALES_STAR_PROD_BIX | | | | | | 18 | TABLE ACCESS BY INDEX ROWID | SALES_STAR | 2 | 40 | 205 (0)| 00:00:03 | ------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 7 - access("P"."PROD_SUBCATEGORY_DESC"='Memory') 12 - access("C"."CUST_CITY"='Oxford') 13 - access("C"."CUST_GENDER"='F') 16 - access("S"."CUST_ID"="C"."CUST_ID") 17 - access("S"."PROD_ID"="P"."PROD_ID") Statistics ---------------------------------------------------------- 139 recursive calls 0 db block gets 353 consistent gets 36 physical reads 0 redo size 569 bytes sent via SQL*Net to client 416 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 1 rows processedNow if you look at the rows, bytes and cost columns in the execution plan, the cost of the plan is about 39% lower than the previous one, with the amount of bytes being scanned looking considerably less than the previous example. Step 4 in the execution plan shows a MERGE JOIN CARTESIAN that to me looks like it's cartesian joining the two dimension tables. Turning to Page 258 of Jonathan Lewis' Cost-Based Oracle Fundamentals I can see this is the old behavior of Oracle before star transformations were available, and could well be used when multi-column indexes (or apparently, multiple bitmap indexes) are available on a fact table and the cartesian product of the dimension tables isn't too large. Whichever way, it's one other way of combining fact and dimension tables when star transformations aren't available, and it could well be what you see if you've set things up for star transformations but not actually turned them on (see this other article from the blog where we can also see them occurring), but they're not really what we're after.
Now if we turn on star transformations, here's the execution plan that we'll get:
SQL> -- now run the same query again, this time with star transformations enabled SQL> alter session set star_transformation_enabled='true'; Session altered. SQL> select sum(quantity_sold), p.prod_subcategory_desc, c.cust_gender 2 from sales_star s, products_star p, customers_star c 3 where s.prod_id = p.prod_id 4 and s.cust_id = c.cust_id 5 and p.prod_subcategory_desc = 'Memory' 6 and c.cust_city = 'Oxford' 7 and c.cust_gender = 'F' 8 group by p.prod_subcategory_desc, c.cust_gender; Execution Plan ---------------------------------------------------------- Plan hash value: 2728756496 ----------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 41 | 27 (4)| 00:00:01 | | 1 | TEMP TABLE TRANSFORMATION | | | | | | | 2 | LOAD AS SELECT | SYS_TEMP_0FD9D660C_56C9C5 | | | | | | 3 | TABLE ACCESS BY INDEX ROWID | CUSTOMERS_STAR | 29 | 406 | 9 (0)| 00:00:01 | | 4 | BITMAP CONVERSION TO ROWIDS | | | | | | | 5 | BITMAP AND | | | | | | |* 6 | BITMAP INDEX SINGLE VALUE | CUSTOMERS_STAR_CITY_BIX | | | | | |* 7 | BITMAP INDEX SINGLE VALUE | CUSTOMERS_STAR_GENDER_BIX | | | | | | 8 | HASH GROUP BY | | 1 | 41 | 18 (6)| 00:00:01 | |* 9 | HASH JOIN | | 1 | 41 | 18 (6)| 00:00:01 | |* 10 | HASH JOIN | | 1 | 36 | 15 (0)| 00:00:01 | | 11 | TABLE ACCESS BY INDEX ROWID | PRODUCTS_STAR | 2 | 32 | 2 (0)| 00:00:01 | | 12 | BITMAP CONVERSION TO ROWIDS | | | | | | |* 13 | BITMAP INDEX SINGLE VALUE | PRODUCTS_STAR_SUBCATEGORY_BIX | | | | | | 14 | TABLE ACCESS BY INDEX ROWID | SALES_STAR | 13 | 260 | 13 (0)| 00:00:01 | | 15 | BITMAP CONVERSION TO ROWIDS | | | | | | | 16 | BITMAP AND | | | | | | | 17 | BITMAP MERGE | | | | | | | 18 | BITMAP KEY ITERATION | | | | | | | 19 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660C_56C9C5 | 1 | 13 | 2 (0)| 00:00:01 | |* 20 | BITMAP INDEX RANGE SCAN | SALES_STAR_CUST_BIX | | | | | | 21 | BITMAP MERGE | | | | | | | 22 | BITMAP KEY ITERATION | | | | | | | 23 | TABLE ACCESS BY INDEX ROWID | PRODUCTS_STAR | 2 | 32 | 2 (0)| 00:00:01 | | 24 | BITMAP CONVERSION TO ROWIDS| | | | | | |* 25 | BITMAP INDEX SINGLE VALUE | PRODUCTS_STAR_SUBCATEGORY_BIX | | | | | |* 26 | BITMAP INDEX RANGE SCAN | SALES_STAR_PROD_BIX | | | | | | 27 | TABLE ACCESS FULL | SYS_TEMP_0FD9D660C_56C9C5 | 29 | 145 | 2 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 6 - access("C"."CUST_CITY"='Oxford') 7 - access("C"."CUST_GENDER"='F') 9 - access("S"."CUST_ID"="C0") 10 - access("S"."PROD_ID"="P"."PROD_ID") 13 - access("P"."PROD_SUBCATEGORY_DESC"='Memory') 20 - access("S"."CUST_ID"="C0") 25 - access("P"."PROD_SUBCATEGORY_DESC"='Memory') 26 - access("S"."PROD_ID"="P"."PROD_ID") Note ----- - star transformation used for this statement Statistics ---------------------------------------------------------- 127 recursive calls 10 db block gets 221 consistent gets 1 physical reads 1464 redo size 569 bytes sent via SQL*Net to client 416 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processedSo what's happening here then? Well for a start, the cost of the plan has gone down to 41, from the 205 and 282 shown by the previous two execution plans. The largest number of rows we ever process is 29, with a maximum of 406 bytes of data compared to 897k of data in on of the other execution plans. Just looking at this plan compared to the other two, it certainly looks a lot more efficient to the other two, but what exactly is going on when the star transformation happens? Well, one complicating factor in the execution plan is the presence of a temporary table, so I'll set the star transformation parameter to "temp_disable" to remove this, and the execution plan should be a bit easier to read:
SQL> -- show the execution plan with temporary table disabled SQL> alter session set star_transformation_enabled='temp_disable'; Session altered. SQL> select sum(quantity_sold), p.prod_subcategory_desc, c.cust_gender 2 from sales_star s, products_star p, customers_star c 3 where s.prod_id = p.prod_id 4 and s.cust_id = c.cust_id 5 and p.prod_subcategory_desc = 'Memory' 6 and c.cust_city = 'Oxford' 7 and c.cust_gender = 'F' 8 group by p.prod_subcategory_desc, c.cust_gender; Execution Plan ---------------------------------------------------------- Plan hash value: 2728442436 ---------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 50 | 25 (4)| 00:00:01 | | 1 | SORT GROUP BY NOSORT | | 1 | 50 | 25 (4)| 00:00:01 | |* 2 | HASH JOIN | | 1 | 50 | 25 (4)| 00:00:01 | |* 3 | HASH JOIN | | 1 | 36 | 15 (0)| 00:00:01 | | 4 | TABLE ACCESS BY INDEX ROWID | PRODUCTS_STAR | 2 | 32 | 2 (0)| 00:00:01 | | 5 | BITMAP CONVERSION TO ROWIDS | | | | | | |* 6 | BITMAP INDEX SINGLE VALUE | PRODUCTS_STAR_SUBCATEGORY_BIX | | | | | | 7 | TABLE ACCESS BY INDEX ROWID | SALES_STAR | 13 | 260 | 13 (0)| 00:00:01 | | 8 | BITMAP CONVERSION TO ROWIDS | | | | | | | 9 | BITMAP AND | | | | | | | 10 | BITMAP MERGE | | | | | | | 11 | BITMAP KEY ITERATION | | | | | | | 12 | TABLE ACCESS BY INDEX ROWID | CUSTOMERS_STAR | 29 | 406 | 9 (0)| 00:00:01 | | 13 | BITMAP CONVERSION TO ROWIDS| | | | | | | 14 | BITMAP AND | | | | | | |* 15 | BITMAP INDEX SINGLE VALUE| CUSTOMERS_STAR_CITY_BIX | | | | | |* 16 | BITMAP INDEX SINGLE VALUE| CUSTOMERS_STAR_GENDER_BIX | | | | | |* 17 | BITMAP INDEX RANGE SCAN | SALES_STAR_CUST_BIX | | | | | | 18 | BITMAP MERGE | | | | | | | 19 | BITMAP KEY ITERATION | | | | | | | 20 | TABLE ACCESS BY INDEX ROWID | PRODUCTS_STAR | 2 | 32 | 2 (0)| 00:00:01 | | 21 | BITMAP CONVERSION TO ROWIDS| | | | | | |* 22 | BITMAP INDEX SINGLE VALUE | PRODUCTS_STAR_SUBCATEGORY_BIX | | | | | |* 23 | BITMAP INDEX RANGE SCAN | SALES_STAR_PROD_BIX | | | | | | 24 | TABLE ACCESS BY INDEX ROWID | CUSTOMERS_STAR | 29 | 406 | 9 (0)| 00:00:01 | | 25 | BITMAP CONVERSION TO ROWIDS | | | | | | | 26 | BITMAP AND | | | | | | |* 27 | BITMAP INDEX SINGLE VALUE | CUSTOMERS_STAR_CITY_BIX | | | | | |* 28 | BITMAP INDEX SINGLE VALUE | CUSTOMERS_STAR_GENDER_BIX | | | | | ---------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("S"."CUST_ID"="C"."CUST_ID") 3 - access("S"."PROD_ID"="P"."PROD_ID") 6 - access("P"."PROD_SUBCATEGORY_DESC"='Memory') 15 - access("C"."CUST_CITY"='Oxford') 16 - access("C"."CUST_GENDER"='F') 17 - access("S"."CUST_ID"="C"."CUST_ID") 22 - access("P"."PROD_SUBCATEGORY_DESC"='Memory') 23 - access("S"."PROD_ID"="P"."PROD_ID") 27 - access("C"."CUST_CITY"='Oxford') 28 - access("C"."CUST_GENDER"='F') Note ----- - star transformation used for this statement Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 276 consistent gets 0 physical reads 0 redo size 569 bytes sent via SQL*Net to client 416 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processedOK, so that's a bit clearer. So what's going on then? Well, the first thing to understand about a star transformation is that it's not a new kind of join, it's more of a join strategy. When Oracle is allowed to consider star transformations and your tables are set up correctly (bitmap indexes on the fact table dimension key columns, constraints declared between the tables, cost-based optimizer used and so on), what happens is that your query is rewritten from, in my example, something like this:
select sum(quantity_sold), p.prod_subcategory_desc, c.cust_gender from sales_star s, products_star p, customers_star c where s.prod_id = p.prod_id and s.cust_id = c.cust_id and p.prod_subcategory_desc = 'Memory' and c.cust_city = 'Oxford' and c.cust_gender = 'F' group by p.prod_subcategory_desc, c.cust_gender;to this:
select sum(quantity_sold), p.prod_subcategory_desc, c.cust_gender from sales_star s, products_star p, customers_star c where s.prod_id in (select pr.prod_id from products pr where pr.prod_subcategory_desc = 'Memory') and s.cust_id in (select cu.cust_id from customers cu where cu.cust_city = 'Oxford' and cu.cust_gender = 'F') and s.prod_id = p.prod_id and s.cust_id = c.cust_id group by p.prod_subcategory_desc, c.cust_gender;Now this functionally returns the same data as the previous version of the query, but crucially it allows the cost-based optimizer to execute the query in the following way:
- Firstly, the dimension key values from the dimension tables are located for the filters specified, and these are then used to locate the relevant fact table rows (per dimension) using bitmap indexes on the fact table dimension columns. In the fourth execution plan, this happens in steps 10 through to 23.
To my mind, the crucial thing to be aware of in the star transformation join strategy is the way that the bitmap indexes on the fact table are used. By creating these subqueries in the rewriten version of the SQL statement, Oracle can identify which rows in the individual fact table bitmap indexes are of interest, and the nature of bitmap indexes means that they can then be logically ANDed, OR'd and so on very quickly to identify exactly which fact table rows are of interest to us, just Oracle does when it filters a dimension table on more than one bitmap indexed column. Once the fact table has been filtered in this way it's then much cheaper to join the result to the individual dimension tables, and this is why a star transformation can be much more efficient than a series of nested loop joins between the fact table and the dimension tables - you end up filtering the fact table much earlier on and therefore end up shifting much less data around to answer your query.
So, to me that answers the questions "what does a star transformation actually do?" (it rewrites your query so that the fact table can be filtered using a logical AND of the bitmap indexed dimension key columns), and it also explains what's going on in a star transformation explain plan compared to a regular explain plan. One question it didn't answer though was when Oracle started adding the "star transformation used for this statement" message to explain plans - looking back at my previous examples, I started seeing it added to explain plans back from Oracle 10gR1, I'm not sure if it appeared in 9iR2 or earlier. I'm also not sure how accurate it is - I'm sure I've seen explain plans in 10gR2 that looked very much like a star transformation had taken place, but this message didn't appear as part of the plan output (I used DBMS_XPLAN.DISPLAY if I remember correctly). If anyone knows any more about this feature and when it started appearing, then let me know.