Partition Exchange Dataloading
Revision history - minor change to clarify one note 2-jun-05
At the end of 2003 I took on a project to redesign a multi-terabyte DW for one of our customers - changes to their data feeds required the replacement of the majority of interfaces. I also took the opportunity to move from the old-style 8i code (the original DW went live in 1999) to Oracle 9.2 (10g was precluded as the query tool vendor would not certify their product against 10g) The original system had 2.25 years of sales history at POS line level along with other fact base tables for stock and financial reporting. For simplicity I will only write about the sales history. The 1999 design was an early adopter of partitioned tables, 'day-level sales' was partitioned by date with 7 days per partition. This was further sub-partitioned by 4-way hash on the product key. Being a retailer, all dataloads were inserts, there was never a requirement to update any 'open' transactions. The basic approach was to load the fact interface to a stage file, validate the data, apply any business transforms on the data prior to bulk appending it to the fact table.
Reviewing the way people used the data over 4 years we could see that sub-partitioning did not give much advantage in terms of partition elimination, very few queries looked at a single product and the chances that all of the products in the predicate residing in the same sub-partition became increasingly remote. People also had a requirement to look at single days and not necessarily the 7 days in the sub-partition. Based on this I decided to change the partition plan to 1 day/partition with no sub-partitions. This gave a slight reduction in the size of the partitions (the same number of rows now sit in seven partitions for a week against 4 sub-partitions per week in the old system) The single day per partition would also allow simple partition exchange dataloading.
Partition exchange is a basically a name-swap operation between a table and single partition, as such it is a quick process. The default mode for partition exchange does not validate that the table contains the correct data for the exchange partition - it's down to the developer to get this right (there is however a 'with validation' option to raise an error if the data does not match the partition key). In our sales dataload process we split the sales data into three - data that fails our integrity checking (not to be applied), data that relates to the current batch (the bulk of the records) and data that has arrived late - perhaps through problems getting the data back from the stores - 'late data' could cover more than one day. Late data is added to the partitioned table by conventional techniques (insert append) - exchange is not directly applicable as the partition is already populated. Current day data is loaded through exchange. The outline block level code for this is:
Identify current day;
identify partition to be exchanged;
create empty table (T1) for exchange in the same tablespace as the exchange target partition (can use CTAS where 1=2);
populate T1;
exchange T1 with the target partition;
drop table T1;
collect stats on the newly populated partition;
Notes:
Although it is possible to pre-index the exchange table and include the indexes in the exchange, I tend not to do this.
To avoid constraint errors during the exchange foreign key constraints should be novalidate on the source table and target partition. I set my fact FK constraints to be DISABLE NOVALIDATE RELY and use my ETL layer to force integrity.
If you have materialized view logs on your fact table be prepared to wait whilst the logs get updated in the exchange partition operation
It is possible to use partition exchange on late data or fact update operations but first you will need to populate your source table with the current partition data then apply new data or update old records; you may well find that the time taken to use partition exchange for an "update style" operation is longer than that for direct update of the partition; but for those who need 24x7 user access to data this may be the way to go.