Partition Exchange Loading using OWB10gR2
In this final posting in the series, I'll look at how Oracle Warehouse Builder 10.2 can be used to perform Partition Exchange Loading into a target table.
Partition Exchange Loading (PEL) s a data warehousing technique that is useful when you are loading data into a large, partitioned table. By swapping the table that contains the data to be loaded with an empty partition in the partitioned table, the data appears in the target table instantaneously together with any indexes associated with it, giving you the ability to prepare, load and index data offline, and then have it instantly appear in the data warehouse. Oracle Warehouse Builder has supported partition exchange loading for some time now, and in this article I'll go through how it works with version 10.2
For partition exchange to work, you need to satisfy certain conditions. Some of them are general to the Oracle database, and some are restrictions put in place by Warehouse Builder.
- The table to be loaded must be partitioned (obviously), and additionally for PEL to work in Warehouse Builder the table must be range partitioned on a date datatype column.
- The target table must only have local indexes
- The source table must have the same column definition as the target table (in the same order, with the same datatypes and lengths)
- The source table should have the same index definition as the target table, although as usually it won't be partitioned the indexes here will be global
- The source table must have the same constraints as the target table.
In this example, I'll base my target table on the UNITS_FACT table from the Global Sample Schema, but I'll turn the MONTH_ID column, which in the sample schema is a number column, into a date datatype using an OWB mapping.
I first of all use the mapping to define the table, by joining a view over the UNITS_FACT table that only returns data from dates before 2004 to the DATE_DIM table to get hold of the transaction date in the required format, and then using this to define a new table which I then create and bind into the repository. Then, after creating and binding the table in the repository, I can then add the partition definition to the table.
The trick here is to make sure you range partition on a date column, and name the partitions to a set standard:
-
Ydddd if they are year partitions, i.e. Y2000, Y2001 etc
-
Ydddd_Qd for quarter partitions, i.e. Y2000_Q1
-
Ydddd_Qd_Mdd for month partitions, i..e Y2000_Q1_M03
-
Ydddd_Qd_Mdd_Ddd for days
-
Ydddd_Qd_Mdd_Ddd_Hdd for hours, or
-
Ydddd_Qd_Mdd_Ddd_Hdd_Mdd for minute partitions
In my case, I want to partition the table by year, and so I call the partitions Y1999, Y2000, Y2002 through to Y2005. When you use this naming convention, Warehouse Builder auto-fills the "values" column with the correct end date for the partition.
Note that you need to create in advance the partition that you're going to load data into - the PEL process doesn't create the partition itself. In my case, the mapping I've just used is going to load data up until 2003 into the table, and I'm going to create another mapping in a moment to partition exchange load additional data from 2004, so I create partitions for all years from 1999 up to 2005.
Next I add the index definition, creating a single local index covering all four key columns, which picks up the partitioning scheme from the parent table.
Now I add the primary key constraint, built on the same four columns in the same order as the index.
Then, I close the Data Object Editor and right-click on the table in the project explorer, to bring up the table configuration. Using the properties panel, I select "Use Indexes" against the primary key element to tell Oracle to use the existing unique index on the table rather than automatically create one when creating the primary key constraint.
Now that the target, partitioned table definition is complete, I run the mapping to bring across the initial set of data, which goes up to 2003.
Next I have to work on the source for the partition exchange load. In Warehouse Builder, there are two types of partition exchange load;
-
Direct, where the source table is directly swapped with the partition in the target table, leaving you with an empty source table afterwards, and
-
Indirect, where OWB creates an intermediate staging table from data in your source, and swaps that with the target partition.
Direct is faster, as it just requires OWB to issue some DDL, whilst indirect involves populating an intermediate table; the only reason you'd use the latter is if you haven't got a source table to swap, i.e. what you're PEL'ing is the results of a jointer, or a splitter, or something else that's not a "table". In our case, we'll go for direct as then we'll see the swapping in action and we'll be able to tell it worked, as our source table will be empty at the end.
To do this, I need an intermediate staging table, which I call UNITS_FACT_STAGING. I create this in the same way as before, creating a mapping to take data from another view over the UNITS_FACT table in the Global Sample Schema, this time with a WHERE clause limiting data to just 2004. After binding and creating the table, I enter the Data Object Editor and define the index, which in this case is global as the table isn't partitioned.
After that I define the primary key the same was as before, and select "Using Index" from the configuration page to tell Oracle to re-use the index. Then, I deploy the table, deploy the mapping and bring the data across. I'm all set now to put together the partition exchange load.
Before I do this, I go into SQL*Plus and run some SELECTs.
SQL> conn global_pel/password@ora10g Connected. SQL> select count(*) 2 from global.units_fact 3 /COUNT(*)
222589
SQL> select count(*)
2 from units_fact_partitioned
3 /COUNT(*)
196038
SQL> select count(*)
2 from units_fact_partitioned partition(Y2003)
3 /COUNT(*)
43800
SQL> select count(*)
2 from units_fact_partitioned partition(Y2004)
3 /COUNT(*)
0
SQL> select count(*)
2 from units_fact_staging
3 /COUNT(*)
26551</pre>
OK, that looks good. The target partitioned table currently is missing the data for 2004, which can in fact be found in the staging table. Now it's time to put the PEL mapping together.
The mapping itself is very simple. I just drag the source and target tables onto a mapping canvas, and connect the columns I wish to map. In reality, as I'm looking to do a direct PEL, I only really need to map one column, as the data will be loaded "en-masse" by a DDL swap of the segments rather than at column level - the reason you'd map the rest is if you're doing an indirect PEL and Warehouse Builder needs to create an intermediate table, or if you want there to be a fall-back in case PEL can't happen. However, in the latter case, the only reason PEL won't happen is because the conditions aren't met - either there's a global index on the target table, or the source table definition doesn't match that of the target table - and in this case, I'd want to fix the problem in the first place, not have Warehouse Builder fail-over to a much slower method.
Anyway, once the objects are mapped on the canvas, I need to close the mapping and right-click on it in the Project Explorer to configure it. This is where you tell Warehouse Builder to load using PEL.
Now, it's a case of deploying the mapping, running it, and checking the results. Going back to SQL*Plus, I run SELECTs on the target and source tables:
SQL> select count(*) 2 from units_fact_partitioned partition(Y2004) 3 /COUNT(*)
26551
SQL> select count(*)
2 from units_fact_staging
3 /COUNT(*)
0</pre>
It's worked. Warehouse Builder has swapped the UNITS_FACT_STAGING table and it's index with the empty Y2004 partition in the UNITS_FACT_PARTITIONED table and it's local index partition, instantly loading the partitioned table and making it ready for use.