Playing Around with Star Transformations and Bitmap Indexes
I’ve been doing some work this week with a client in London, who wants to put together a reporting data mart based on Oracle Database 10gR1. Although they and I were fairly sure a dimensional database design using bitmap indexes and star transformations would give them the query performance they required, there were a number of questions we had at the start that I wanted to get resolved:
- How easy would it be to get a star transformation to take place - in earlier versions of Oracle (8i, 9i) star transformations seemed a bit temperamental and I wanted to see how easily these took place in the latest version of the database.
- What do the execution plans look like for queries against the star schema with STAR_TRANSFORMATION_ENABLED set to TRUE and FALSE, how do these compare to execution plans for the same table with regular non-unique B*Tree indexes, and did the queries that used star transformations really return data that much faster?
- What was the difference in size, and time to create, for bitmap indexes compared to regular non-unique indexes?
- If we had a column with a bitmap index on it and we then updated the column (unusual, but this is one of their requirements), by how much did this increase the size of the index and how fast was the update compared to updating a column with a regular non-unique index on it? Also, if we were running a select against the same table, to what extent would the query be slowed down compared to running the same query against a table containing regular non-unique indexes (i.e. what is the impact of row-locking when updating a column with a bitmap index on it)
The data itself wouldn’t be available for another week or so, so I created a test user and copied across data from the SH Sample Schema. To create a representative set of data, I copied the SH.SALES table into itself a few times to give it around 7m rows, and created two versions of the table, SALES_IDX on which I’d create regular non-unique indexes and SALES_BIX, which would have the bitmap indexes on its dimension key columns. Then, I copied across the SH.CUSTOMERS, SH.PRODUCTS, SH.TIMES, SH.CHANNELS and SH.COUNTRIES tables across, created primary key constraints and unique indexes on their ID columns, bitmap indexes on their attribute columns, bitmap indexes on the dimension key columns in SALES_BIX and non-unique indexes on the corresponding SALES_IDX columns, and then gathered statistics on the schema in readiness for testing.
SQL> desc sales_idx Name Null? Type ----------------------------- -------- ------------ PROD_ID NOT NULL NUMBER CUST_ID NOT NULL NUMBER TIME_ID NOT NULL DATE CHANNEL_ID NOT NULL NUMBER PROMO_ID NOT NULL NUMBER QUANTITY_SOLD NOT NULL NUMBER(10,2) AMOUNT_SOLD NOT NULL NUMBER(10,2) SQL> desc sales_bix Name Null? Type ----------------------------- -------- ------------ PROD_ID NOT NULL NUMBER CUST_ID NOT NULL NUMBER TIME_ID NOT NULL DATE CHANNEL_ID NOT NULL NUMBER PROMO_ID NOT NULL NUMBER QUANTITY_SOLD NOT NULL NUMBER(10,2) AMOUNT_SOLD NOT NULL NUMBER(10,2) SQL> select table_name 2 , index_name 3 , index_type 4 from user_indexes 5 where table_name like 'SALES%' 6 / TABLE_NAME INDEX_NAME INDEX_TYPE -------------------- -------------------- ------------ SALES_BIX TIME_ID_BIX BITMAP SALES_BIX CUST_ID_BIX BITMAP SALES_BIX CHANNEL_ID_BIX BITMAP SALES_BIX PROD_ID_BIX BITMAP SALES_IDX PROD_ID_IDX NORMAL SALES_IDX CUST_ID_IDX NORMAL SALES_IDX TIME_ID_IDX NORMAL SALES_IDX CHANNEL_ID_IDX NORMAL 8 rows selected. SQL> begin 2 3 DBMS_STATS.GATHER_SCHEMA_STATS ( 4 ownname => 'SH_STAR_TEST', 5 estimate_percent => 10 6 ); 7 end; 8 / PL/SQL procedure successfully completed.
Note that if you’re familiar with the SH schema you’ll have noticed that I’ve missed out on indexing the PROMOTIONS dimension - I forgot about this during the testing but as the schema doesn’t exist in the test schema I set up, and I don’t reference it in joins, it’ll effectively be ignored and treated as if it’s a fact table measure, so we can disregard it.
So now all the data was there, I ran a few select statements to see how the execution plan came out, firstly with STAR_TRANSFORMATION_ENABLED set to FALSE, and then to TRUE, against both the bitmap-indexed and regular-indexed tables. The results were interesting.
SQL> select sum(amount_sold), count(*) 2 from sales_bix s, customers c, channels ch, 3 times t, products p 4 where s.prod_id = p.prod_id 5 and s.time_id = t.time_id 6 and s.channel_id = ch.channel_id 7 and s.cust_id = c.cust_id 8 and c.cust_state_province = 'CA' 9 and t.calendar_year = 1998 10 and t.calendar_month_number = 11 11 and ch.channel_id = 2 12 and p.prod_category_desc = 'Hardware'; SUM(AMOUNT_SOLD) COUNT(*) ---------------- ---------- 84735.44 56 Elapsed: 00:00:00.39 Execution Plan ---------------------------------------------------------- Plan hash value: 3700521350 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 80 | 698 (1)| 00:00:09 | | 1 | SORT AGGREGATE | | 1 | 80 | | | |* 2 | HASH JOIN | | 601 | 48080 | 698 (1)| 00:00:09 | | 3 | TABLE ACCESS BY INDEX ROWID | SALES_BIX | 20 | 500 | 546 (1)| 00:00:07 | 4 | NESTED LOOPS | | 1181 | 75584 | 546 (1)| 00:00:07 | | 5 | MERGE JOIN CARTESIAN | | 60 | 2340 | 27 (4)| 00:00:01 | | 6 | NESTED LOOPS | | 2 | 48 | 3 (0)| 00:00:01 | |* 7 | INDEX UNIQUE SCAN | CHANNELS_PK | 1 | 3 | 0 (0)| 00:00:01 | |* 8 | TABLE ACCESS FULL | PRODUCTS | 2 | 42 | 3 (0)| 00:00:01 | | 9 | BUFFER SORT | | 30 | 450 | 24 (5)| 00:00:01 | | 10 | TABLE ACCESS BY INDEX ROWID | TIMES | 30 | 450 | 27 (4)| 00:00:01 | | 11 | BITMAP CONVERSION TO ROWIDS| | | | | | | 12 | BITMAP AND | | | | | | |* 13 | BITMAP INDEX SINGLE VALUE| CALENDAR_MONTH_NUMBER_BIX | | | | | |* 14 | BITMAP INDEX SINGLE VALUE| CALENDAR_YEAR_BIX | | | | | | 15 | BITMAP CONVERSION TO ROWIDS | | | | | | | 16 | BITMAP AND | | | | | | |* 17 | BITMAP INDEX SINGLE VALUE | TIME_ID_BIX | | | | | |* 18 | BITMAP INDEX SINGLE VALUE | PROD_ID_BIX | | | | | |* 19 | BITMAP INDEX SINGLE VALUE | CHANNEL_ID_BIX | | | | | |* 20 | VIEW | index$_join$_002 | 3506 | 56096 | 151 (3)| 00:00:02 | |* 21 | HASH JOIN | | | | | | | 22 | BITMAP CONVERSION TO ROWIDS | | 3506 | 56096 | 2 (0)| 00:00:01 | |* 23 | BITMAP INDEX SINGLE VALUE | CUST_STATE_PROVINCE_BIX | | | | | 24 | INDEX FAST FULL SCAN | CUSTOMERS_PK | 3506 | 56096 | 148 (3)| 00:00:02 | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("S"."CUST_ID"="C"."CUST_ID") 7 - access("CH"."CHANNEL_ID"=2) 8 - filter("P"."PROD_CATEGORY_DESC"='Hardware') 13 - access("T"."CALENDAR_MONTH_NUMBER"=11) 14 - access("T"."CALENDAR_YEAR"=1998) 17 - access("S"."TIME_ID"="T"."TIME_ID") 18 - access("S"."PROD_ID"="P"."PROD_ID") 19 - access("S"."CHANNEL_ID"=2) 20 - filter("C"."CUST_STATE_PROVINCE"='CA') 21 - access(ROWID=ROWID) 23 - access("C"."CUST_STATE_PROVINCE"='CA')
That's as expected, the query uses the bitmap indexes, does a number of hash joins and nested loop joins and returns the data in 0.39 seconds (not bad for 7m fact table rows) with a cost of 698. Running the query against the fact table with only regular non-unique indexes returns the following autotrace output:
SQL> select sum(amount_sold), count(*) 2 from sales_idx s, customers c, channels ch, 3 times t, products p 4 where s.prod_id = p.prod_id 5 and s.time_id = t.time_id 6 and s.channel_id = ch.channel_id 7 and s.cust_id = c.cust_id 8 and c.cust_state_province = 'CA' 9 and t.calendar_year = 1998 10 and t.calendar_month_number = 11 11 and ch.channel_id = 2 12 and p.prod_category_desc = 'Hardware'; SUM(AMOUNT_SOLD) COUNT(*) ---------------- ---------- 84735.44 56 Elapsed: 00:00:20.96 Execution Plan ---------------------------------------------------------- Plan hash value: 2003962144 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 80 | 3175 (1)| 00:00:39 | | 1 | SORT AGGREGATE | | 1 | 80 | | | |* 2 | HASH JOIN | | 598 | 47840 | 3175 (1)| 00:00:39 | |* 3 | HASH JOIN | | 1176 | 75264 | 3023 (1)| 00:00:37 | | 4 | TABLE ACCESS BY INDEX ROWID | TIMES | 30 | 450 | 11 (0)| 00:00:01 | | 5 | BITMAP CONVERSION TO ROWIDS| | | | | | | 6 | BITMAP AND | | | | | | |* 7 | BITMAP INDEX SINGLE VALUE| CALENDAR_MONTH_NUMBER_BIX | | | | | |* 8 | BITMAP INDEX SINGLE VALUE| CALENDAR_YEAR_BIX | | | | | |* 9 | TABLE ACCESS BY INDEX ROWID | SALES_IDX | 28620 | 698K| 1504 (1)| 00:00:19 | | 10 | NESTED LOOPS | | 57239 | 2738K| 3011 (1)| 00:00:37 | | 11 | NESTED LOOPS | | 2 | 48 | 3 (0)| 00:00:01 | |* 12 | INDEX UNIQUE SCAN | CHANNELS_PK | 1 | 3 | 0 (0)| 00:00:01 | |* 13 | TABLE ACCESS FULL | PRODUCTS | 2 | 42 | 3 (0)| 00:00:01 | |* 14 | INDEX RANGE SCAN | PROD_ID_IDX | 102K| | 214 (2)| 00:00:03 | |* 15 | VIEW | index$_join$_002 | 3506 | 56096 | 151 (3)| 00:00:02 | |* 16 | HASH JOIN | | | | | | | 17 | BITMAP CONVERSION TO ROWIDS| | 3506 | 56096 | 2 (0)| 00:00:01 | |* 18 | BITMAP INDEX SINGLE VALUE | CUST_STATE_PROVINCE_BIX | | | | | | 19 | INDEX FAST FULL SCAN | CUSTOMERS_PK | 3506 | 56096 | 148 (3)| 00:00:02 | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("S"."CUST_ID"="C"."CUST_ID") 3 - access("S"."TIME_ID"="T"."TIME_ID") 7 - access("T"."CALENDAR_MONTH_NUMBER"=11) 8 - access("T"."CALENDAR_YEAR"=1998) 9 - filter("S"."CHANNEL_ID"=2) 12 - access("CH"."CHANNEL_ID"=2) 13 - filter("P"."PROD_CATEGORY_DESC"='Hardware') 14 - access("S"."PROD_ID"="P"."PROD_ID") 15 - filter("C"."CUST_STATE_PROVINCE"='CA') 16 - access(ROWID=ROWID) 18 - access("C"."CUST_STATE_PROVINCE"='CA')
OK, so this time the query took around 20 seconds with a cost of 3175, about ten times the cost of the query using bitmap indexes, around what I expected.
Now I enabled star transformations and ran the same queries again. Firstly, the table with bitmap indexes on it:
SQL> alter session set star_transformation_enabled = true; Session altered. Elapsed: 00:00:00.00 SQL> select sum(amount_sold), count(*) 2 from sales_bix s, customers c, channels ch, 3 times t, products p 4 where s.prod_id = p.prod_id 5 and s.time_id = t.time_id 6 and s.channel_id = ch.channel_id 7 and s.cust_id = c.cust_id 8 and c.cust_state_province = 'CA' 9 and t.calendar_year = 1998 10 and t.calendar_month_number = 11 11 and ch.channel_id = 2 12 and p.prod_category_desc = 'Hardware'; SUM(AMOUNT_SOLD) COUNT(*) ---------------- ---------- 84735.44 56 Elapsed: 00:00:01.08 Execution Plan ---------------------------------------------------------- Plan hash value: 3776350793 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 31 | 201 (5)| 00:00:03 | | 1 | SORT AGGREGATE | | 1 | 31 | | | | 2 | NESTED LOOPS | | 60 | 1860 | 201 (5)| 00:00:03 | |* 3 | INDEX UNIQUE SCAN | CHANNELS_PK | 1 | 3 | 0 (0)| 00:00:01 | | 4 | TABLE ACCESS BY INDEX ROWID | SALES_BIX | 60 | 1680 | 201 (5)| 00:00:03 | | 5 | BITMAP CONVERSION TO ROWIDS | | | | | | | 6 | BITMAP AND | | | | | | | 7 | BITMAP MERGE | | | | | | | 8 | BITMAP KEY ITERATION | | | | | | |* 9 | TABLE ACCESS FULL | PRODUCTS | 2 | 42 | 3 (0)| 00:00:01 | |* 10 | BITMAP INDEX RANGE SCAN | PROD_ID_BIX | | | | | | 11 | BITMAP MERGE | | | | | | | 12 | BITMAP KEY ITERATION | | | | | | | 13 | TABLE ACCESS BY INDEX ROWID | TIMES | 30 | 450 | 11 (0)| 00:00:01 | | 14 | BITMAP CONVERSION TO ROWIDS | | | | | | | 15 | BITMAP AND | | | | | | |* 16 | BITMAP INDEX SINGLE VALUE | CALENDAR_MONTH_NUMBER_BIX | | | | | |* 17 | BITMAP INDEX SINGLE VALUE | CALENDAR_YEAR_BIX | | | | | |* 18 | BITMAP INDEX RANGE SCAN | TIME_ID_BIX | | | | | |* 19 | BITMAP INDEX SINGLE VALUE | CHANNEL_ID_BIX | | | | | | 20 | BITMAP MERGE | | | | | | | 21 | BITMAP KEY ITERATION | | | | | | |* 22 | VIEW | index$_join$_017 | 3506 | 56096 | 151 (3)| 00:00:02 | |* 23 | HASH JOIN | | | | | | | 24 | BITMAP CONVERSION TO ROWIDS| | 3506 | 56096 | 2 (0)| 00:00:01 | |* 25 | BITMAP INDEX SINGLE VALUE | CUST_STATE_PROVINCE_BIX | | | | | | 26 | INDEX FAST FULL SCAN | CUSTOMERS_PK | 3506 | 56096 | 148 (3)| 00:00:02 | |* 27 | BITMAP INDEX RANGE SCAN | CUST_ID_BIX | | | | | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("CH"."CHANNEL_ID"=2) 9 - filter("P"."PROD_CATEGORY_DESC"='Hardware') 10 - access("S"."PROD_ID"="P"."PROD_ID") 16 - access("T"."CALENDAR_MONTH_NUMBER"=11) 17 - access("T"."CALENDAR_YEAR"=1998) 18 - access("S"."TIME_ID"="T"."TIME_ID") 19 - access("S"."CHANNEL_ID"=2) 22 - filter("C"."CUST_STATE_PROVINCE"='CA') 23 - access(ROWID=ROWID) 25 - access("C"."CUST_STATE_PROVINCE"='CA') 27 - access("S"."CUST_ID"="C"."CUST_ID") Note ----- - star transformation used for this statement
Ok, so the execution time is slightly higher but the cost is lower, 201 to 698. Given the small data set size and the unscientific conditions, I wasn't too surprised about the slight time discrepency, but the execution plan cost was lower, which is what I expected. I'll run some more tests later and check that the star transform does infact return the data faster as well as generate a lower-cost plan. Note also the Note in the autotrace output, "-- Star transformation used for this statement".
Now I run the same test on the table with regular non-unique indexes.
SQL> select sum(amount_sold), count(*) 2 from sales_idx s, customers c, channels ch, 3 times t, products p 4 where s.prod_id = p.prod_id 5 and s.time_id = t.time_id 6 and s.channel_id = ch.channel_id 7 and s.cust_id = c.cust_id 8 and c.cust_state_province = 'CA' 9 and t.calendar_year = 1998 10 and t.calendar_month_number = 11 11 and ch.channel_id = 2 12 and p.prod_category_desc = 'Hardware'; SUM(AMOUNT_SOLD) COUNT(*) ---------------- ---------- 84735.44 56 Elapsed: 00:00:01.88 Execution Plan ---------------------------------------------------------- Plan hash value: 747039580 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 31 | 3625 (7)| 00:00:44 | | 1 | SORT AGGREGATE | | 1 | 31 | | | | 2 | NESTED LOOPS | | 60 | 1860 | 3625 (7)| 00:00:44 | |* 3 | INDEX UNIQUE SCAN | CHANNELS_PK | 1 | 3 | 0 (0)| 00:00:01 | |* 4 | TABLE ACCESS BY INDEX ROWID | SALES_IDX | 60 | 1680 | 3625 (7)| 00:00:44 | | 5 | BITMAP CONVERSION TO ROWIDS | | | | | | | 6 | BITMAP AND | | | | | | | 7 | BITMAP MERGE | | | | | | | 8 | BITMAP KEY ITERATION | | | | | | | 9 | TABLE ACCESS BY INDEX ROWID | TIMES | 30 | 450 | 11 (0)| 00:00:01 | | 10 | BITMAP CONVERSION TO ROWIDS | | | | | | | 11 | BITMAP AND | | | | | | |* 12 | BITMAP INDEX SINGLE VALUE | CALENDAR_MONTH_NUMBER_BIX | | | | | |* 13 | BITMAP INDEX SINGLE VALUE | CALENDAR_YEAR_BIX | | | | | | 14 | BITMAP CONVERSION FROM ROWIDS| | | | | | |* 15 | INDEX RANGE SCAN | TIME_ID_IDX | | | 14 (0)| 00:00:01 | | 16 | BITMAP MERGE | | | | | | | 17 | BITMAP KEY ITERATION | | | | | | |* 18 | TABLE ACCESS FULL | PRODUCTS | 2 | 42 | 3 (0)| 00:00:01 | | 19 | BITMAP CONVERSION FROM ROWIDS| | | | | | |* 20 | INDEX RANGE SCAN | PROD_ID_IDX | | | 215 (2)| 00:00:03 | | 21 | BITMAP MERGE | | | | | | | 22 | BITMAP KEY ITERATION | | | | | | |* 23 | VIEW | index$_join$_017 | 3506 | 56096 | 151 (3)| 00:00:02 | |* 24 | HASH JOIN | | | | | | | 25 | BITMAP CONVERSION TO ROWIDS| | 3506 | 56096 | 2 (0)| 00:00:01 | |* 26 | BITMAP INDEX SINGLE VALUE | CUST_STATE_PROVINCE_BIX | | | | | | 27 | INDEX FAST FULL SCAN | CUSTOMERS_PK | 3506 | 56096 | 148 (3)| 00:00:02 | | 28 | BITMAP CONVERSION FROM ROWIDS| | | | | | |* 29 | INDEX RANGE SCAN | CUST_ID_IDX | | | 4 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("CH"."CHANNEL_ID"=2) 4 - filter("S"."CHANNEL_ID"=2) 12 - access("T"."CALENDAR_MONTH_NUMBER"=11) 13 - access("T"."CALENDAR_YEAR"=1998) 15 - access("S"."TIME_ID"="T"."TIME_ID") 18 - filter("P"."PROD_CATEGORY_DESC"='Hardware') 20 - access("S"."PROD_ID"="P"."PROD_ID") 23 - filter("C"."CUST_STATE_PROVINCE"='CA') 24 - access(ROWID=ROWID) 26 - access("C"."CUST_STATE_PROVINCE"='CA') 29 - access("S"."CUST_ID"="C"."CUST_ID") Note ----- - star transformation used for this statement
Now that's interesting. The data comes back faster than when we had star transformations disabled, the cost is slightly higher, and the autotrace output reports that a star transformation was in fact used on this query, even though the fact table didn't have bitmap indexes on its dimension columns. Looking at the execution plan (and I'm certainly no expert on interpreting explain plans here, so there's a good chance I've got the wrong end of the stick here) but it looks like the optimizer is creating on-the-fly bitmap indexes ("BITMAP CONVERSION FROM ROWIDS") and then scanning these bitmaps as part of a star transformation ("BITMAP CONVERSION TO ROWIDS"). Contrast this with the bitmap-indexed tables' execution plan, where only the "BITMAP CONVERSION TO ROWIDS" step takes place. I could well be wrong here (I suspect I am) but if anyone knows any more about this - I'm on 10g 10.2.0.3 Enterprise Edition on Windows XP SP2 - let me know as I'm definitely interested to know what's going on.
So, it looks like at least in terms of generating the most efficient execution plan, a fact table with bitmap indexes on the dimension key columns looks like being the optimal solution, although this apparent "on-the-fly" creation of bitmap indexes from regular B*Tree indexes looks interesting. So what about the question about index size?
SQL> select segment_name 2 , bytes/1024000 "Size in MB" 3 from user_segments 4 where segment_name like 'CUST_ID%' 5 or segment_name like 'PROD_ID%' 6 or segment_name like 'CHANNEL_ID%' 7 or segment_name like 'TIME_ID%' 8 / SEGMENT_NAME Size in MB -------------------- ---------- CHANNEL_ID_BIX 3.072 CHANNEL_ID_IDX 122.88 CUST_ID_BIX 27.648 CUST_ID_IDX 131.072 PROD_ID_BIX 2.048 PROD_ID_IDX 122.88 TIME_ID_BIX 4.096 TIME_ID_IDX 163.84 8 rows selected.
So the normal non-unique indexes are currently over 100MB in size each, whilst the bitmap indexes (which are highly compressed) are below 5MB for the small dimensions, and around 27MB for the large, 55000-row dimension. Now we'll see how these are affected by inserts and updates to the tables.
SQL> insert into sales_bix 2 select * 3 from sh.sales 4 / 918843 rows created. Elapsed: 00:00:56.67 SQL> commit 2 / Commit complete. Elapsed: 00:00:00.39 SQL> insert into sales_idx 2 select * 3 from sh.sales 4 / 918843 rows created. Elapsed: 00:09:55.17 SQL> commit; Commit complete. Elapsed: 00:00:00.02 SQL> SQL> select segment_name 2 , bytes/1024000 "Size in MB" 3 from user_segments 4 where segment_name like 'CUST_ID%' 5 or segment_name like 'PROD_ID%' 6 or segment_name like 'CHANNEL_ID%' 7 or segment_name like 'TIME_ID%' 8 / SEGMENT_NAME Size in MB -------------------- ---------- CHANNEL_ID_BIX 3.072 CHANNEL_ID_IDX 139.264 CUST_ID_BIX 27.648 CUST_ID_IDX 180.224 PROD_ID_BIX 2.048 PROD_ID_IDX 147.456 TIME_ID_BIX 4.096 TIME_ID_IDX 196.608 8 rows selected.
Ok, so the table with bitmapped indexes took just under a minute to insert rows in to (1m rows into a table of about 7m rows) and the indexes stayed the same size, whilst the corresponding insert into the table with regular indexes took around 9 minutes and increased the index size by between 10% and 40%. Note that with this insert, no new dimension key values were used, it was the same set of data being inserted back into the test tables.
And now the same for updates. I'll do updates to the CUST_ID column as this has the largest amount of key values (55000-odd). Here are the results:
SQL> update sales_bix 2 set cust_id = 3001 3 where time_id between to_date('01-jan-2000','DD-MON-YYYY') 4 and to_date('30-apr-2000','DD-MON-YYYY') 5 and channel_id =3 6 / 461151 rows updated. Elapsed: 00:00:45.51 SQL> commit; Commit complete. Elapsed: 00:00:00.01 SQL> update sales_idx 2 set cust_id = 3000 3 where time_id between to_date('01-jan-1999','DD-MON-YYYY') 4 and to_date('30-apr-1999','DD-MON-YYYY') 5 and channel_id = 2 6 / 186120 rows updated. Elapsed: 00:03:01.97 SQL> update sales_idx 2 set cust_id = 3001 3 where time_id between to_date('01-jan-2000','DD-MON-YYYY') 4 and to_date('30-apr-2000','DD-MON-YYYY') 5 and channel_id =3 6 / 461151 rows updated. Elapsed: 00:06:18.26 SQL> commit; Commit complete. Elapsed: 00:00:00.05 SQL> select segment_name 2 , bytes/1024000 "Size in MB" 3 from user_segments 4 where segment_name like 'CUST_ID%' 5 or segment_name like 'PROD_ID%' 6 or segment_name like 'CHANNEL_ID%' 7 or segment_name like 'TIME_ID%' 8 / SEGMENT_NAME Size in MB -------------------- ---------- CHANNEL_ID_BIX 3.072 CHANNEL_ID_IDX 139.264 CUST_ID_BIX 28.672 CUST_ID_IDX 196.608 PROD_ID_BIX 2.048 PROD_ID_IDX 147.456 TIME_ID_BIX 4.096 TIME_ID_IDX 196.608 8 rows selected.
Ok, so the CUST_ID_IDX index has got bigger, but the bitmap indexes have stayed the same. What if I create another 5,000 customers, create another 100,000 sales records using these new customers and add them two the two fact tables?
SQL> insert into sales_bix 2 select * 3 from new_sales 4 / 100000 rows created. Elapsed: 00:00:03.78 SQL> insert into sales_idx 2 select * 3 from new_sales 4 / 100000 rows created. Elapsed: 00:00:33.18 SQL> commit; Commit complete. Elapsed: 00:00:00.01 SQL> select segment_name 2 , bytes/1024000 "Size in MB" 3 from user_segments 4 where segment_name like 'CUST_ID%' 5 or segment_name like 'PROD_ID%' 6 or segment_name like 'CHANNEL_ID%' 7 or segment_name like 'TIME_ID%' 8 / SEGMENT_NAME Size in MB ------------------------- ---------- CHANNEL_ID_BIX 3.072 CHANNEL_ID_IDX 147.456 CUST_ID_BIX 28.672 CUST_ID_IDX 196.608 PROD_ID_BIX 2.048 PROD_ID_IDX 147.456 TIME_ID_BIX 4.096 TIME_ID_IDX 196.608 8 rows selected.
So, no difference then, and the bitmap-indexed table is still faster to insert data in to, compared to a regularly-indexed table.
What I've found then is that, with my data set and my laptop, there's no noticeable degradation when maintaining a fact table with bitmap indexes on the fact table columns, and in fact activities on this table are faster than the same activities on a table with B*Tree indexes. Now I'm of course open to the idea that my data set is skewed, or my environment is unrepresentative, but I'm surprised nonetheless as I expected all hell to break loose once I tried updating and inserting into column with bitmap indexes on them, and that's not what I got in my testing. Maybe my fact table is too small (7m rows) or my dimensions are too small (two are <100 rows, one is 1.5k and one is 55k rows), but there was a clear performance difference between the table with regular indexes and bitmap indexes, so we can't put it down to such a small data set that all activities ran in just a second or so. Strange.
Also, the star transformation taking place on the table with regular, non-unique indexes was strange as well. I had a quick email exchange with Peter Scott and he mentioned that DB/2, for example, also creates bitmap indexes on the fly, so this could be something that was introduced in 10gR1 or R2 and I didn't notice at the time. It was an interesting exercise though as I thought firstly that bitmap indexes being updated would be a performance killer, and it wasn't, and I was pleasantly surprised when the star transformation took place so easily (and in fact appears to have taken place even when I didn't have the neccessary bitmap indexes in place).
We'll be running all this on the real data set in a couple of weeks, including trying to update table columns with bitmap indexes on them, so it'll be interesting to see whether this test data set ends up being representative of the real-world data; also, if the bitmap indexes revert to type (which I suspect they will) it'll be interesting to see where I went wrong in my testing. I've you've spotted anything already (or indeed if this is in fact the correct behavior, especially the star transformation bit) let me know as I'm keen to get these unresolved bits sorted out. Also, I'm conscious that my testing is on 10g 10.2.0.3 whilst the client is on 10.1.0.5, so if this is a 10gR2-specific behavior, again let me know.