Transcend and Segment-Switching

As you read this most recent post on the capabilities of Transcend, I'd like to pose a question to you: would you consider using Transcend in your environment if it was open-sourced? Currently, we implement Transcend as part of the Rittman Mead Rapid Deployment Framework for our clients, but have been considering opening it up to the community. We've made no iron-clad decisions, but I'm curious if our readers have any opinions.

Transcend supports the notion of what I like to call "segment-switching": table renames and partition exchanges. Each is suited for different purposes, though in some situations it boils down to preference. I'll demonstrate both methods, and also some of the ancillary features of the product while setting up my test case.

First, I'll create a version of the SH.SALES table in another schema to act as my fact table, keeping the partition information, adding the indexes, and transferring the statistics:

SQL> BEGIN
  2     trans_etl.build_table(
  3                            p_table          => 'sales_fact',
  4                            p_owner          => 'target',
  5                            p_source_table   => 'sales',
  6                            p_source_owner   => 'sh',
  7                            p_tablespace     => 'users',
  8                            p_partitioning   => 'yes',
  9                            p_rows           => 'yes',
 10                            p_indexes        => 'yes',
 11                            p_constraints    => 'no',
 12                            p_statistics     => 'transfer'
 13                          );
 14  END;
 15  /
Table TARGET.SALES_FACT created
Number of records inserted into TARGET.SALES_FACT: 918843
Statistics from SH.SALES transfered to TARGET.SALES_FACT
Index SALES_FACT_CHANNEL_BIX built
Index SALES_FACT_CUST_BIX built
Index SALES_FACT_PROD_BIX built
Index SALES_FACT_PROMO_BIX built
Index SALES_FACT_TIME_BIX built
5 index creation processes executed for TARGET.SALES_FACT

PL/SQL procedure successfully completed.

SQL>

Now I'll create a staging table based on the same SALES table, but this time, I won't build the indexes:

SQL> BEGIN
  2     trans_etl.build_table(
  3                            p_table          => 'sales_stg',
  4                            p_owner          => 'target',
  5                            p_source_table   => 'sales',
  6                            p_source_owner   => 'sh',
  7                            p_tablespace     => 'users',
  8                            p_partitioning   => 'yes',
  9                            p_rows           => 'yes',
 10                            p_indexes        => 'no',
 11                            p_constraints    => 'no',
 12                            p_statistics     => 'no'
 13                          );
 14  END;
 15  /
Table TARGET.SALES_STG created
Number of records inserted into TARGET.SALES_STG: 918843

PL/SQL procedure successfully completed.

SQL>

Now that I have the staging table containing the rows I want to put into the target table, I'll use the REPLACE_TABLE procedure to interchange the two tables, handling all index and constraint maintenance. The P_IDX_CONCURRENCY and P_CON_CONCURRENCY parameters determine whether indexes and constraints will be built sequentially -- one after another in a loop -- or whether Transcend will submit the DDL statements to the Oracle Scheduler so that they can run concurrently, and then wait for them all to complete:

SQL> BEGIN
  2     trans_etl.replace_table(
  3                              p_owner           => 'target',
  4                              p_table           => 'sales_fact',
  5                              p_source_table    => 'sales_stg',
  6                              p_idx_concurrency => 'yes',
  7                              p_con_concurrency => 'yes',
  8                              p_statistics      => 'transfer'
  9                          );
 10  END;
 11  /
Statistics from TARGET.SALES_FACT transferred to TARGET.SALES_STG
Oracle scheduler job BUILD_INDEXES841 created
Oracle scheduler job BUILD_INDEXES841 enabled
Index SALES_STG_CHANNEL_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES842 created
Oracle scheduler job BUILD_INDEXES842 enabled
Index SALES_STG_CUST_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES843 created
Oracle scheduler job BUILD_INDEXES843 enabled
Index SALES_STG_PROD_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES844 created
Oracle scheduler job BUILD_INDEXES844 enabled
Index SALES_STG_PROMO_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES845 created
Oracle scheduler job BUILD_INDEXES845 enabled
Index SALES_STG_TIME_BIX creation submitted to the Oracle scheduler
5 index creation processes submitted to the Oracle scheduler for TARGET.SALES_STG
No matching constraints found on TARGET.SALES_FACT
No matching constraints to drop found on TARGET.SALES_FACT
5 indexes dropped on TARGET.SALES_FACT
TARGET.SALES_STG and TARGET.SALES_FACT table names interchanged
Index TARGET.SALES_STG_CHANNEL_BIX renamed to SALES_FACT_CHANNEL_BIX
Index TARGET.SALES_STG_CUST_BIX renamed to SALES_FACT_CUST_BIX
Index TARGET.SALES_STG_TIME_BIX renamed to SALES_FACT_TIME_BIX
Index TARGET.SALES_STG_PROMO_BIX renamed to SALES_FACT_PROMO_BIX
Index TARGET.SALES_STG_PROD_BIX renamed to SALES_FACT_PROD_BIX

PL/SQL procedure successfully completed.

SQL> select count(*) from target.sales_fact;

  COUNT(*)
----------
    918843

1 row selected.

SQL>

Notice that the parameter P_SOURCE_OWNER is not available in this procedure. That's because table renames cannot be done across schemas. Also, none, one or both of the tables can be partitioned; Transcend adjusts the DDL for the indexes accordingly. Actually... the functionality is not unlike the DBMS_REDEFINITION package... though DBMS_REDEFINITION has a lot more functionality. I actually have a task on the product roadmap to see about rewriting REPLACE_TABLE to use DBMS_REDEFINITION... but for another day.

The more meaningful segment-switching process is the good old fashioned partition exchange. This is useful in lots of load scenarios, especially when a fact table is involved. Partition exchange loading allows for maintenance type tasks such as index rebuilds and constraint validation to occur on a staging table without affecting the actual reporting table, so report queries can continue to run while these tasks are being completed.

I'll create a new non-partitioned table without rows, and then I'll re-insert all the rows from the SH.SALES table, but I'll adjust them so that they correspond to the highest partition in the SALES_FACT table: 2003Q4.

SQL> BEGIN
  2     trans_etl.build_table(
  3                            p_table          => 'sales_stg',
  4                            p_owner          => 'stage',
  5                            p_source_table   => 'sales',
  6                            p_source_owner   => 'sh',
  7                            p_tablespace     => 'users',
  8                            p_partitioning   => 'no',
  9                            p_rows           => 'no',
 10                            p_indexes        => 'no',
 11                            p_constraints    => 'no',
 12                            p_statistics     => 'ignore'
 13                          );
 14  END;
 15  /
Table STAGE.SALES_STG created

PL/SQL procedure successfully completed.

SQL> SELECT dbms_metadata.get_ddl('TABLE','SALES_STG','STAGE') from dual;

DBMS_METADATA.GET_DDL('TABLE','SALES_STG','STAGE')
-----------------------------------------------------------------------

  CREATE TABLE "STAGE"."SALES_STG"
   (	"PROD_ID" NUMBER,
	"CUST_ID" NUMBER,
	"TIME_ID" DATE,
	"CHANNEL_ID" NUMBER,
	"PROMO_ID" NUMBER,
	"QUANTITY_SOLD" NUMBER(10,2),
	"AMOUNT_SOLD" NUMBER(10,2)
   ) SEGMENT CREATION DEFERRED
  PCTFREE 5 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS NOLOGGING
  TABLESPACE "USERS"


1 row selected.

SQL> insert /*+ APPEND */ into stage.sales_stg
  2  select prod_id, cust_id, to_date('10/15/2003','mm/dd/yyyy'),
  3  channel_id, promo_id, quantity_sold, amount_sold
  4  from sh.sales;

918843 rows created.

SQL> commit;

Commit complete.

SQL>

As the DBMS_METADATA function above shows, Transcend converted the DDL from the SH.SALES table to make it a non-partitioned table. Now, I'll perform the partition exchange:

SQL> BEGIN
  2     trans_etl.exchange_partition(
  3                                   p_table           => 'sales_fact',
  4                                   p_owner           => 'target',
  5                                   p_source_table    => 'sales_stg',
  6                                   p_source_owner    => 'stage',
  7                                   p_idx_concurrency => 'yes',
  8                                   p_statistics      => 'transfer'
  9                                 );
 10  END;
 11  /
Statistics from partition SALES_Q4_2003 of TARGET.SALES_FACT transferred to STAGE.SALES_STG
Oracle scheduler job BUILD_INDEXES871 created
Oracle scheduler job BUILD_INDEXES871 enabled
Index SALES_STG_CHANNEL_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES872 created
Oracle scheduler job BUILD_INDEXES872 enabled
Index SALES_STG_CUST_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES873 created
Oracle scheduler job BUILD_INDEXES873 enabled
Index SALES_STG_PROD_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES874 created
Oracle scheduler job BUILD_INDEXES874 enabled
Index SALES_STG_PROMO_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES875 created
Oracle scheduler job BUILD_INDEXES875 enabled
Index SALES_STG_TIME_BIX creation submitted to the Oracle scheduler
5 index creation processes submitted to the Oracle scheduler for STAGE.SALES_STG
No matching constraints found on TARGET.SALES_FACT
STAGE.SALES_STG exchanged for partition SALES_Q4_2003 of table TARGET.SALES_FACT
No matching constraints to drop found on STAGE.SALES_STG
5 indexes dropped on STAGE.SALES_STG

PL/SQL procedure successfully completed.

SQL> select count(*) from target.sales_fact;

  COUNT(*)
----------
   1837686

1 row selected.

SQL>

You can see that Transcend handled all of the indexes and even built them concurrently in the background. You may notice that I didn't even specify which partition I wanted to exchange the table in for. I could have specified this with the P_PARTNAME parameter, but when left null, Transcend will just assume that I want the highest partition.

Partition-exchange loading is recognized as an effective method for loading fact tables... but can it be used for dimension tables as well? I would argue it can, and actually Transcend uses partition-exchange loading to perform hybrid SCD Type 1 and Type 2 loading techniques (more on that in a future post). Partition exchanges can provide the same high-availability and background load scenarios used for fact tables... but the issue here is that dimension tables are generally not evenly distributed in time as a fact table is: they usually aren't even partitioned, and when they are, it's normally not based on range.

What I often do in these scenarios is use single-partition tables, so the partitioned table is little more than a container for a max partition. I create the staging table as the single-partitioned table, and keep the dimension table unpartitioned. Transcend actually doesn't care whether the source table or the target table is partitioned: it knows that only one table in the series can be partitioned, and it will error if this is not the case. So let's build our dimension table and our single-partition staging table:

SQL> BEGIN
  2     trans_etl.build_table(
  3                            p_table          => 'customer_dim',
  4                            p_owner          => 'target',
  5                            p_source_table   => 'customers',
  6                            p_source_owner   => 'sh',
  7                            p_tablespace     => 'users',
  8                            p_partitioning   => 'no',
  9                            p_rows           => 'yes',
 10                            p_indexes        => 'yes',
 11                            p_constraints    => 'yes',
 12                            p_statistics     => 'transfer'
 13                          );
 14  END;
 15  /
Table TARGET.CUSTOMER_DIM created
Number of records inserted into TARGET.CUSTOMER_DIM: 55500
Statistics from SH.CUSTOMERS transferred to TARGET.CUSTOMER_DIM
Index CUSTOMER_DIM_GENDER_BIX built
Index CUSTOMER_DIM_MARITAL_BIX built
Index CUSTOMER_DIM_YOB_BIX built
Index CUSTOMER_DIM_PK built
4 index creation processes executed for TARGET.CUSTOMER_DIM
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of unnamed constraint executed
Creation of constraint CUSTOMER_DIM_COUNTRY_FK executed
Creation of constraint CUSTOMER_DIM_PK executed
17 constraints built for TARGET.CUSTOMER_DIM

PL/SQL procedure successfully completed.

SQL>

I'll have to create the single-partition table manually, because Transcend doesn't have the functionality to convert DDL from a non-partitioned table to a partitioned one (yet). It really doesn't matter which column is used as the partitioning column, since the partitioned table is really only a container for the max partition. So typically, I'll use the primary key:

SQL> CREATE TABLE STAGE.CUSTOMER_STG
  2     (    CUST_ID NUMBER,
  3          CUST_FIRST_NAME VARCHAR2(20),
  4          CUST_LAST_NAME VARCHAR2(40),
  5          CUST_GENDER CHAR(1),
  6          CUST_YEAR_OF_BIRTH NUMBER(4,0),
  7          CUST_MARITAL_STATUS VARCHAR2(20),
  8          CUST_STREET_ADDRESS VARCHAR2(40),
  9          CUST_POSTAL_CODE VARCHAR2(10),
 10          CUST_CITY VARCHAR2(30),
 11          CUST_CITY_ID NUMBER,
 12          CUST_STATE_PROVINCE VARCHAR2(40),
 13          CUST_STATE_PROVINCE_ID NUMBER,
 14          COUNTRY_ID NUMBER,
 15          CUST_MAIN_PHONE_NUMBER VARCHAR2(25),
 16          CUST_INCOME_LEVEL VARCHAR2(30),
 17          CUST_CREDIT_LIMIT NUMBER,
 18          CUST_EMAIL VARCHAR2(30),
 19          CUST_TOTAL VARCHAR2(14),
 20          CUST_TOTAL_ID NUMBER,
 21          CUST_SRC_ID NUMBER,
 22          CUST_EFF_FROM DATE,
 23          CUST_EFF_TO DATE,
 24          CUST_VALID VARCHAR2(1)
 25     )
 26         partition BY range (cust_id)
 27         ( partition max VALUES less than (MAXVALUE))
 28  /

Table created.

SQL>

Now I'll perform the partition exchange in the other direction: where the source table is the partitioned table, and the target table is the non-partitioned table. For Transcend, the source and target concepts are determined by which segment is being loaded, not which one is being exchanged into:

SQL> BEGIN
  2     trans_etl.exchange_partition(
  3                                   p_table           => 'customer_dim',
  4                                   p_owner           => 'target',
  5                                   p_source_table    => 'customer_stg',
  6                                   p_source_owner    => 'stage',
  7                                   p_idx_concurrency => 'yes',
  8                                   p_con_concurrency => 'yes',
  9                                   p_statistics      => 'transfer'
 10                                 );
 11  END;
 12  /
Statistics from TARGET.CUSTOMER_DIM transferred to partition MAX of STAGE.CUSTOMER_STG
Oracle scheduler job BUILD_INDEXES876 created
Oracle scheduler job BUILD_INDEXES876 enabled
Index CUSTOMER_STG_GENDER_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES877 created
Oracle scheduler job BUILD_INDEXES877 enabled
Index CUSTOMER_STG_MARITAL_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES878 created
Oracle scheduler job BUILD_INDEXES878 enabled
Index CUSTOMER_STG_YOB_BIX creation submitted to the Oracle scheduler
Oracle scheduler job BUILD_INDEXES879 created
Oracle scheduler job BUILD_INDEXES879 enabled
Index CUSTOMER_STG_PK creation submitted to the Oracle scheduler
4 index creation processes submitted to the Oracle scheduler for STAGE.CUSTOMER_STG
Oracle scheduler job BUILD_CONSTRAINTS880 created
Oracle scheduler job BUILD_CONSTRAINTS880 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS881 created
Oracle scheduler job BUILD_CONSTRAINTS881 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS882 created
Oracle scheduler job BUILD_CONSTRAINTS882 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS883 created
Oracle scheduler job BUILD_CONSTRAINTS883 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS884 created
Oracle scheduler job BUILD_CONSTRAINTS884 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS885 created
Oracle scheduler job BUILD_CONSTRAINTS885 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS886 created
Oracle scheduler job BUILD_CONSTRAINTS886 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS887 created
Oracle scheduler job BUILD_CONSTRAINTS887 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS888 created
Oracle scheduler job BUILD_CONSTRAINTS888 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS889 created
Oracle scheduler job BUILD_CONSTRAINTS889 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS890 created
Oracle scheduler job BUILD_CONSTRAINTS890 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS891 created
Oracle scheduler job BUILD_CONSTRAINTS891 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS892 created
Oracle scheduler job BUILD_CONSTRAINTS892 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS893 created
Oracle scheduler job BUILD_CONSTRAINTS893 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS894 created
Oracle scheduler job BUILD_CONSTRAINTS894 enabled
Creation of unnamed constraint submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS895 created
Oracle scheduler job BUILD_CONSTRAINTS895 enabled
Creation of constraint CUSTOMER_STG_COUNTRY_FK submitted to the Oracle scheduler
Oracle scheduler job BUILD_CONSTRAINTS896 created
Oracle scheduler job BUILD_CONSTRAINTS896 enabled
Creation of constraint CUSTOMER_STG_PK submitted to the Oracle scheduler
17 constraints submitted to the Oracle scheduler for STAGE.CUSTOMER_STG
TARGET.CUSTOMER_DIM exchanged for partition MAX of table STAGE.CUSTOMER_STG
Constraint CUSTOMER_STG_PK dropped
Constraint SYS_C0021255 dropped
Constraint SYS_C0021256 dropped
Constraint SYS_C0021257 dropped
Constraint SYS_C0021263 dropped
Constraint SYS_C0021259 dropped
Constraint SYS_C0021260 dropped
Constraint SYS_C0021261 dropped
Constraint SYS_C0021262 dropped
Constraint SYS_C0021258 dropped
10 constraints dropped on STAGE.CUSTOMER_STG
4 indexes dropped on STAGE.CUSTOMER_STG

PL/SQL procedure successfully completed.

SQL>

I know... there's a lot going on here. First, the statistics are transferred from one segment to another. The other options for the P_STATISTICS parameter are 'gather' and 'ignore'. But basically, the 'transfer' method is preferred, because it maintains continuity between automatic stats collection runs. All the indexes and constraints are built concurrently, the exchange is performed, and finally, these same indexes and constraints are dropped on the new source table in preparation for the next run.

Hopefully this demonstrates the segment-switching capabilities of Transcend, and paves the way for me to describe some of the more advanced features, especially, handling slowly-changing dimensions in a set-based process, as well as configuring Transcend "mappings" to correspond with mappings that get executed as part of the ETL batch run.