Transcend and Segment-Switching
As you read this most recent post on the capabilities of Transcend, I'd like to pose a question to you: would you consider using Transcend in your environment if it was open-sourced? Currently, we implement Transcend as part of the Rittman Mead Rapid Deployment Framework for our clients, but have been considering opening it up to the community. We've made no iron-clad decisions, but I'm curious if our readers have any opinions.
Transcend supports the notion of what I like to call "segment-switching": table renames and partition exchanges. Each is suited for different purposes, though in some situations it boils down to preference. I'll demonstrate both methods, and also some of the ancillary features of the product while setting up my test case.
First, I'll create a version of the SH.SALES table in another schema to act as my fact table, keeping the partition information, adding the indexes, and transferring the statistics:
SQL> BEGIN 2 trans_etl.build_table( 3 p_table => 'sales_fact', 4 p_owner => 'target', 5 p_source_table => 'sales', 6 p_source_owner => 'sh', 7 p_tablespace => 'users', 8 p_partitioning => 'yes', 9 p_rows => 'yes', 10 p_indexes => 'yes', 11 p_constraints => 'no', 12 p_statistics => 'transfer' 13 ); 14 END; 15 / Table TARGET.SALES_FACT created Number of records inserted into TARGET.SALES_FACT: 918843 Statistics from SH.SALES transfered to TARGET.SALES_FACT Index SALES_FACT_CHANNEL_BIX built Index SALES_FACT_CUST_BIX built Index SALES_FACT_PROD_BIX built Index SALES_FACT_PROMO_BIX built Index SALES_FACT_TIME_BIX built 5 index creation processes executed for TARGET.SALES_FACT PL/SQL procedure successfully completed. SQL>
Now I'll create a staging table based on the same SALES table, but this time, I won't build the indexes:
SQL> BEGIN 2 trans_etl.build_table( 3 p_table => 'sales_stg', 4 p_owner => 'target', 5 p_source_table => 'sales', 6 p_source_owner => 'sh', 7 p_tablespace => 'users', 8 p_partitioning => 'yes', 9 p_rows => 'yes', 10 p_indexes => 'no', 11 p_constraints => 'no', 12 p_statistics => 'no' 13 ); 14 END; 15 / Table TARGET.SALES_STG created Number of records inserted into TARGET.SALES_STG: 918843 PL/SQL procedure successfully completed. SQL>
Now that I have the staging table containing the rows I want to put into the target table, I'll use the REPLACE_TABLE procedure to interchange the two tables, handling all index and constraint maintenance. The P_IDX_CONCURRENCY and P_CON_CONCURRENCY parameters determine whether indexes and constraints will be built sequentially -- one after another in a loop -- or whether Transcend will submit the DDL statements to the Oracle Scheduler so that they can run concurrently, and then wait for them all to complete:
SQL> BEGIN 2 trans_etl.replace_table( 3 p_owner => 'target', 4 p_table => 'sales_fact', 5 p_source_table => 'sales_stg', 6 p_idx_concurrency => 'yes', 7 p_con_concurrency => 'yes', 8 p_statistics => 'transfer' 9 ); 10 END; 11 / Statistics from TARGET.SALES_FACT transferred to TARGET.SALES_STG Oracle scheduler job BUILD_INDEXES841 created Oracle scheduler job BUILD_INDEXES841 enabled Index SALES_STG_CHANNEL_BIX creation submitted to the Oracle scheduler Oracle scheduler job BUILD_INDEXES842 created Oracle scheduler job BUILD_INDEXES842 enabled Index SALES_STG_CUST_BIX creation submitted to the Oracle scheduler Oracle scheduler job BUILD_INDEXES843 created Oracle scheduler job BUILD_INDEXES843 enabled Index SALES_STG_PROD_BIX creation submitted to the Oracle scheduler Oracle scheduler job BUILD_INDEXES844 created Oracle scheduler job BUILD_INDEXES844 enabled Index SALES_STG_PROMO_BIX creation submitted to the Oracle scheduler Oracle scheduler job BUILD_INDEXES845 created Oracle scheduler job BUILD_INDEXES845 enabled Index SALES_STG_TIME_BIX creation submitted to the Oracle scheduler 5 index creation processes submitted to the Oracle scheduler for TARGET.SALES_STG No matching constraints found on TARGET.SALES_FACT No matching constraints to drop found on TARGET.SALES_FACT 5 indexes dropped on TARGET.SALES_FACT TARGET.SALES_STG and TARGET.SALES_FACT table names interchanged Index TARGET.SALES_STG_CHANNEL_BIX renamed to SALES_FACT_CHANNEL_BIX Index TARGET.SALES_STG_CUST_BIX renamed to SALES_FACT_CUST_BIX Index TARGET.SALES_STG_TIME_BIX renamed to SALES_FACT_TIME_BIX Index TARGET.SALES_STG_PROMO_BIX renamed to SALES_FACT_PROMO_BIX Index TARGET.SALES_STG_PROD_BIX renamed to SALES_FACT_PROD_BIX PL/SQL procedure successfully completed. SQL> select count(*) from target.sales_fact; COUNT(*) ---------- 918843 1 row selected. SQL>
Notice that the parameter P_SOURCE_OWNER is not available in this procedure. That's because table renames cannot be done across schemas. Also, none, one or both of the tables can be partitioned; Transcend adjusts the DDL for the indexes accordingly. Actually... the functionality is not unlike the DBMS_REDEFINITION package... though DBMS_REDEFINITION has a lot more functionality. I actually have a task on the product roadmap to see about rewriting REPLACE_TABLE to use DBMS_REDEFINITION... but for another day.
The more meaningful segment-switching process is the good old fashioned partition exchange. This is useful in lots of load scenarios, especially when a fact table is involved. Partition exchange loading allows for maintenance type tasks such as index rebuilds and constraint validation to occur on a staging table without affecting the actual reporting table, so report queries can continue to run while these tasks are being completed.
I'll create a new non-partitioned table without rows, and then I'll re-insert all the rows from the SH.SALES table, but I'll adjust them so that they correspond to the highest partition in the SALES_FACT table: 2003Q4.
SQL> BEGIN 2 trans_etl.build_table( 3 p_table => 'sales_stg', 4 p_owner => 'stage', 5 p_source_table => 'sales', 6 p_source_owner => 'sh', 7 p_tablespace => 'users', 8 p_partitioning => 'no', 9 p_rows => 'no', 10 p_indexes => 'no', 11 p_constraints => 'no', 12 p_statistics => 'ignore' 13 ); 14 END; 15 / Table STAGE.SALES_STG created PL/SQL procedure successfully completed. SQL> SELECT dbms_metadata.get_ddl('TABLE','SALES_STG','STAGE') from dual; DBMS_METADATA.GET_DDL('TABLE','SALES_STG','STAGE') ----------------------------------------------------------------------- CREATE TABLE "STAGE"."SALES_STG" ( "PROD_ID" NUMBER, "CUST_ID" NUMBER, "TIME_ID" DATE, "CHANNEL_ID" NUMBER, "PROMO_ID" NUMBER, "QUANTITY_SOLD" NUMBER(10,2), "AMOUNT_SOLD" NUMBER(10,2) ) SEGMENT CREATION DEFERRED PCTFREE 5 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS NOLOGGING TABLESPACE "USERS" 1 row selected. SQL> insert /*+ APPEND */ into stage.sales_stg 2 select prod_id, cust_id, to_date('10/15/2003','mm/dd/yyyy'), 3 channel_id, promo_id, quantity_sold, amount_sold 4 from sh.sales; 918843 rows created. SQL> commit; Commit complete. SQL>
As the DBMS_METADATA function above shows, Transcend converted the DDL from the SH.SALES table to make it a non-partitioned table. Now, I'll perform the partition exchange:
SQL> BEGIN 2 trans_etl.exchange_partition( 3 p_table => 'sales_fact', 4 p_owner => 'target', 5 p_source_table => 'sales_stg', 6 p_source_owner => 'stage', 7 p_idx_concurrency => 'yes', 8 p_statistics => 'transfer' 9 ); 10 END; 11 / Statistics from partition SALES_Q4_2003 of TARGET.SALES_FACT transferred to STAGE.SALES_STG Oracle scheduler job BUILD_INDEXES871 created Oracle scheduler job BUILD_INDEXES871 enabled Index SALES_STG_CHANNEL_BIX creation submitted to the Oracle scheduler Oracle scheduler job BUILD_INDEXES872 created Oracle scheduler job BUILD_INDEXES872 enabled Index SALES_STG_CUST_BIX creation submitted to the Oracle scheduler Oracle scheduler job BUILD_INDEXES873 created Oracle scheduler job BUILD_INDEXES873 enabled Index SALES_STG_PROD_BIX creation submitted to the Oracle scheduler Oracle scheduler job BUILD_INDEXES874 created Oracle scheduler job BUILD_INDEXES874 enabled Index SALES_STG_PROMO_BIX creation submitted to the Oracle scheduler Oracle scheduler job BUILD_INDEXES875 created Oracle scheduler job BUILD_INDEXES875 enabled Index SALES_STG_TIME_BIX creation submitted to the Oracle scheduler 5 index creation processes submitted to the Oracle scheduler for STAGE.SALES_STG No matching constraints found on TARGET.SALES_FACT STAGE.SALES_STG exchanged for partition SALES_Q4_2003 of table TARGET.SALES_FACT No matching constraints to drop found on STAGE.SALES_STG 5 indexes dropped on STAGE.SALES_STG PL/SQL procedure successfully completed. SQL> select count(*) from target.sales_fact; COUNT(*) ---------- 1837686 1 row selected. SQL>
You can see that Transcend handled all of the indexes and even built them concurrently in the background. You may notice that I didn't even specify which partition I wanted to exchange the table in for. I could have specified this with the P_PARTNAME parameter, but when left null, Transcend will just assume that I want the highest partition.
Partition-exchange loading is recognized as an effective method for loading fact tables... but can it be used for dimension tables as well? I would argue it can, and actually Transcend uses partition-exchange loading to perform hybrid SCD Type 1 and Type 2 loading techniques (more on that in a future post). Partition exchanges can provide the same high-availability and background load scenarios used for fact tables... but the issue here is that dimension tables are generally not evenly distributed in time as a fact table is: they usually aren't even partitioned, and when they are, it's normally not based on range.
What I often do in these scenarios is use single-partition tables, so the partitioned table is little more than a container for a max partition. I create the staging table as the single-partitioned table, and keep the dimension table unpartitioned. Transcend actually doesn't care whether the source table or the target table is partitioned: it knows that only one table in the series can be partitioned, and it will error if this is not the case. So let's build our dimension table and our single-partition staging table:
SQL> BEGIN 2 trans_etl.build_table( 3 p_table => 'customer_dim', 4 p_owner => 'target', 5 p_source_table => 'customers', 6 p_source_owner => 'sh', 7 p_tablespace => 'users', 8 p_partitioning => 'no', 9 p_rows => 'yes', 10 p_indexes => 'yes', 11 p_constraints => 'yes', 12 p_statistics => 'transfer' 13 ); 14 END; 15 / Table TARGET.CUSTOMER_DIM created Number of records inserted into TARGET.CUSTOMER_DIM: 55500 Statistics from SH.CUSTOMERS transferred to TARGET.CUSTOMER_DIM Index CUSTOMER_DIM_GENDER_BIX built Index CUSTOMER_DIM_MARITAL_BIX built Index CUSTOMER_DIM_YOB_BIX built Index CUSTOMER_DIM_PK built 4 index creation processes executed for TARGET.CUSTOMER_DIM Creation of unnamed constraint executed Creation of unnamed constraint executed Creation of unnamed constraint executed Creation of unnamed constraint executed Creation of unnamed constraint executed Creation of unnamed constraint executed Creation of unnamed constraint executed Creation of unnamed constraint executed Creation of unnamed constraint executed Creation of unnamed constraint executed Creation of unnamed constraint executed Creation of unnamed constraint executed Creation of unnamed constraint executed Creation of unnamed constraint executed Creation of unnamed constraint executed Creation of constraint CUSTOMER_DIM_COUNTRY_FK executed Creation of constraint CUSTOMER_DIM_PK executed 17 constraints built for TARGET.CUSTOMER_DIM PL/SQL procedure successfully completed. SQL>
I'll have to create the single-partition table manually, because Transcend doesn't have the functionality to convert DDL from a non-partitioned table to a partitioned one (yet). It really doesn't matter which column is used as the partitioning column, since the partitioned table is really only a container for the max partition. So typically, I'll use the primary key:
SQL> CREATE TABLE STAGE.CUSTOMER_STG 2 ( CUST_ID NUMBER, 3 CUST_FIRST_NAME VARCHAR2(20), 4 CUST_LAST_NAME VARCHAR2(40), 5 CUST_GENDER CHAR(1), 6 CUST_YEAR_OF_BIRTH NUMBER(4,0), 7 CUST_MARITAL_STATUS VARCHAR2(20), 8 CUST_STREET_ADDRESS VARCHAR2(40), 9 CUST_POSTAL_CODE VARCHAR2(10), 10 CUST_CITY VARCHAR2(30), 11 CUST_CITY_ID NUMBER, 12 CUST_STATE_PROVINCE VARCHAR2(40), 13 CUST_STATE_PROVINCE_ID NUMBER, 14 COUNTRY_ID NUMBER, 15 CUST_MAIN_PHONE_NUMBER VARCHAR2(25), 16 CUST_INCOME_LEVEL VARCHAR2(30), 17 CUST_CREDIT_LIMIT NUMBER, 18 CUST_EMAIL VARCHAR2(30), 19 CUST_TOTAL VARCHAR2(14), 20 CUST_TOTAL_ID NUMBER, 21 CUST_SRC_ID NUMBER, 22 CUST_EFF_FROM DATE, 23 CUST_EFF_TO DATE, 24 CUST_VALID VARCHAR2(1) 25 ) 26 partition BY range (cust_id) 27 ( partition max VALUES less than (MAXVALUE)) 28 / Table created. SQL>
Now I'll perform the partition exchange in the other direction: where the source table is the partitioned table, and the target table is the non-partitioned table. For Transcend, the source and target concepts are determined by which segment is being loaded, not which one is being exchanged into:
SQL> BEGIN 2 trans_etl.exchange_partition( 3 p_table => 'customer_dim', 4 p_owner => 'target', 5 p_source_table => 'customer_stg', 6 p_source_owner => 'stage', 7 p_idx_concurrency => 'yes', 8 p_con_concurrency => 'yes', 9 p_statistics => 'transfer' 10 ); 11 END; 12 / Statistics from TARGET.CUSTOMER_DIM transferred to partition MAX of STAGE.CUSTOMER_STG Oracle scheduler job BUILD_INDEXES876 created Oracle scheduler job BUILD_INDEXES876 enabled Index CUSTOMER_STG_GENDER_BIX creation submitted to the Oracle scheduler Oracle scheduler job BUILD_INDEXES877 created Oracle scheduler job BUILD_INDEXES877 enabled Index CUSTOMER_STG_MARITAL_BIX creation submitted to the Oracle scheduler Oracle scheduler job BUILD_INDEXES878 created Oracle scheduler job BUILD_INDEXES878 enabled Index CUSTOMER_STG_YOB_BIX creation submitted to the Oracle scheduler Oracle scheduler job BUILD_INDEXES879 created Oracle scheduler job BUILD_INDEXES879 enabled Index CUSTOMER_STG_PK creation submitted to the Oracle scheduler 4 index creation processes submitted to the Oracle scheduler for STAGE.CUSTOMER_STG Oracle scheduler job BUILD_CONSTRAINTS880 created Oracle scheduler job BUILD_CONSTRAINTS880 enabled Creation of unnamed constraint submitted to the Oracle scheduler Oracle scheduler job BUILD_CONSTRAINTS881 created Oracle scheduler job BUILD_CONSTRAINTS881 enabled Creation of unnamed constraint submitted to the Oracle scheduler Oracle scheduler job BUILD_CONSTRAINTS882 created Oracle scheduler job BUILD_CONSTRAINTS882 enabled Creation of unnamed constraint submitted to the Oracle scheduler Oracle scheduler job BUILD_CONSTRAINTS883 created Oracle scheduler job BUILD_CONSTRAINTS883 enabled Creation of unnamed constraint submitted to the Oracle scheduler Oracle scheduler job BUILD_CONSTRAINTS884 created Oracle scheduler job BUILD_CONSTRAINTS884 enabled Creation of unnamed constraint submitted to the Oracle scheduler Oracle scheduler job BUILD_CONSTRAINTS885 created Oracle scheduler job BUILD_CONSTRAINTS885 enabled Creation of unnamed constraint submitted to the Oracle scheduler Oracle scheduler job BUILD_CONSTRAINTS886 created Oracle scheduler job BUILD_CONSTRAINTS886 enabled Creation of unnamed constraint submitted to the Oracle scheduler Oracle scheduler job BUILD_CONSTRAINTS887 created Oracle scheduler job BUILD_CONSTRAINTS887 enabled Creation of unnamed constraint submitted to the Oracle scheduler Oracle scheduler job BUILD_CONSTRAINTS888 created Oracle scheduler job BUILD_CONSTRAINTS888 enabled Creation of unnamed constraint submitted to the Oracle scheduler Oracle scheduler job BUILD_CONSTRAINTS889 created Oracle scheduler job BUILD_CONSTRAINTS889 enabled Creation of unnamed constraint submitted to the Oracle scheduler Oracle scheduler job BUILD_CONSTRAINTS890 created Oracle scheduler job BUILD_CONSTRAINTS890 enabled Creation of unnamed constraint submitted to the Oracle scheduler Oracle scheduler job BUILD_CONSTRAINTS891 created Oracle scheduler job BUILD_CONSTRAINTS891 enabled Creation of unnamed constraint submitted to the Oracle scheduler Oracle scheduler job BUILD_CONSTRAINTS892 created Oracle scheduler job BUILD_CONSTRAINTS892 enabled Creation of unnamed constraint submitted to the Oracle scheduler Oracle scheduler job BUILD_CONSTRAINTS893 created Oracle scheduler job BUILD_CONSTRAINTS893 enabled Creation of unnamed constraint submitted to the Oracle scheduler Oracle scheduler job BUILD_CONSTRAINTS894 created Oracle scheduler job BUILD_CONSTRAINTS894 enabled Creation of unnamed constraint submitted to the Oracle scheduler Oracle scheduler job BUILD_CONSTRAINTS895 created Oracle scheduler job BUILD_CONSTRAINTS895 enabled Creation of constraint CUSTOMER_STG_COUNTRY_FK submitted to the Oracle scheduler Oracle scheduler job BUILD_CONSTRAINTS896 created Oracle scheduler job BUILD_CONSTRAINTS896 enabled Creation of constraint CUSTOMER_STG_PK submitted to the Oracle scheduler 17 constraints submitted to the Oracle scheduler for STAGE.CUSTOMER_STG TARGET.CUSTOMER_DIM exchanged for partition MAX of table STAGE.CUSTOMER_STG Constraint CUSTOMER_STG_PK dropped Constraint SYS_C0021255 dropped Constraint SYS_C0021256 dropped Constraint SYS_C0021257 dropped Constraint SYS_C0021263 dropped Constraint SYS_C0021259 dropped Constraint SYS_C0021260 dropped Constraint SYS_C0021261 dropped Constraint SYS_C0021262 dropped Constraint SYS_C0021258 dropped 10 constraints dropped on STAGE.CUSTOMER_STG 4 indexes dropped on STAGE.CUSTOMER_STG PL/SQL procedure successfully completed. SQL>
I know... there's a lot going on here. First, the statistics are transferred from one segment to another. The other options for the P_STATISTICS parameter are 'gather' and 'ignore'. But basically, the 'transfer' method is preferred, because it maintains continuity between automatic stats collection runs. All the indexes and constraints are built concurrently, the exchange is performed, and finally, these same indexes and constraints are dropped on the new source table in preparation for the next run.
Hopefully this demonstrates the segment-switching capabilities of Transcend, and paves the way for me to describe some of the more advanced features, especially, handling slowly-changing dimensions in a set-based process, as well as configuring Transcend "mappings" to correspond with mappings that get executed as part of the ETL batch run.