Oracle BI EE 10.1.3.4.1 – Sub-Totals & Pivot Calculations
One intriguing or not-so obvious feature of BI EE is the way it does its Sub-Totals & Pivot Table calculations. As with any reporting tool, we expect almost every part of a report to be pushed back to the database in the form of native SQL. But there are certain cases wherein we cannot push an entire report in the form of SQL (for example Sub-Totals & Pivot table aggregations). In such cases, BI Server does these calculations in its own memory. Lets look at what this “in-memory calculation” means and how BI Server treats totals & Pivot table calculations. Since this is not documented anywhere, this behavior can be different across different releases (though i think it will work the same in all 10.1.3.* releases)
Sub-Totals:
To demonstrate how Sub-Totals work, lets start with a very simple report shown below
As you see, this table view report has a Grand Total. The question is how does BI Server create this Total. Does it bring the entire data into BI Server memory and then does the sub-total or does it push that back in SQL or does it write to any temporary file.
To understand this, lets start with looking at the logical SQL in the advanced tab of this report.
SELECT Products.PROD_CATEGORY saw_0, Channels.CHANNEL_DESC saw_1, Sales.AMOUNT_SOLD saw_2 FROM "SH - Training Days - Relational" ORDER BY saw_0, saw_1
As you see this logical SQL does not actually represent the true logical SQL since the aggregation or the total part of the query is not anywhere in this SQL. So, to get the exact SQL we have to go to the Manage->Sessions & extract the logical SQL
SELECT Products.PROD_CATEGORY saw_0, Channels.CHANNEL_DESC saw_1, Sales.AMOUNT_SOLD saw_2, REPORT_SUM(saw_2 BY ) FROM "SH - Training Days - Relational" ORDER BY saw_0, saw_1
If you notice, there will be a new function called REPORT_SUM which will be passed to the logical SQL. REPORT_SUM is an aggregation function that BI Server uses to calculate the total (since we had chosen Report Based Total whenever applicable in the table view). One good thing is we can take the exact SQL and fire it in the logical SQL window
As you see BI Server calculates the total as a column based total. There are quite a few functions like REPORT_SUM. I will list some of the common ones here
-
SUM
-
AGGREGATE
-
MIN
-
MAX
-
COUNT
-
COUNTDISTINCT
After calculating the column based total, BI Server will get the first record value in the Report Sum column and display it as Sub-Total. Now that we know what BI Server does for calculating the sub-totals in a table view, lets look at how the BI Server pushes the REPORT_SUM function. If you look at the physical query for the above logical SQL, you will notice that the report level total is not pushed back to the database. So, the question is how BI EE does this total. To determine this lets look at the {OracleBIData}/tmp folder. You will notice that for every query that has a sub-total (with database cache and presentation services cache disabled) BI EE will create temporary files in this tmp directory. The writing of temp files to this directory is governed by 4 settings in the NQSConfig.ini
WORK_DIRECTORY_PATHS = "C:\Oracle\OracleBIData\tmp"; SORT_MEMORY_SIZE = 4 MB ; SORT_BUFFER_INCREMENT_SIZE = 256 KB ; VIRTUAL_TABLE_PAGE_SIZE = 128 KB ;
So, when we created the above report, BI Server had created a set of temp files in the above directory. These tmp files are actually similar to the cache files but are stored seperately. But how they are handled is unfortunately not documented (though we can arrive at certain conclusions by trying various scenarios). In some complex reports, when the sub-total calculation involves a lot of intermediate rows, you will notice a sizeable increase in the size of the tmp files
Pivot Tables:
Pivot table is probably the most widely used Answers component. Though widely used, many a times it can eventually lead to performance bottlenecks due to a lot of in-memory calculations. Most Pivot calculations are done in the BI Server memory and are not pushed back to the database(though we can control where it is executed to an extent). As a generic rule, in any implementation, if the repository is correctly designed, almost 90 to 95% of the reports can be achieved using normal table views(unless all the reports require a specific feature like Sub-Total at the top etc). The most common reason why pivot table is quite popular is the fact that it offers flexibility to end users in slicing/dicing the data. It is treated as an alternative to a multi-dimensional reporting solution.Pivot table can sometimes hog the temp space so much that even cache writing process might fail due to memory allocation errors. For example, lets take the above report and convert it into a pivot table as shown below
If you notice, all i have done is i have moved a couple of columns to the Excluded section of the Pivot table. If you look at the Physical query, the SQL will still contain Product Id and Channel Desc even though they are excluded.
select T24112.PROD_ID as c2, T24112.PROD_CATEGORY as c4, T26412.CHANNEL_ID as c6, sum(T24170.AMOUNT_SOLD) as c8, sum(T24170.QUANTITY_SOLD) as c9 from CHANNELS T26412, PRODUCTS T24112, SALES T24170 where ( T24112.PROD_ID = T24170.PROD_ID and T24170.CHANNEL_ID = T26412.CHANNEL_ID ) group by T24112.PROD_CATEGORY, T24112.PROD_ID, T24170.CHANNEL_ID, T26412.CHANNEL_ID order by c6
So what this means is BI Server will do all the aggregations in its own memory or by writing to tmp files. If we look at the tmp file directory
As you see, a sudden 59KB file has come into the tmp file directory due to the amount of custom calculations that the BI Server has to do in its memory. The size of the files directly are related to the amount of calculations & the number of cells retrieved. So, whenever you face an issue like a Write to Tmp directory error etc, then the first place to look at is how much of calculations are being done by the BI Server. Also, as much as possible try to reduce the amount of calculations to be done by the pivot table. As more users start accessing the system, more will the number of tmp files and hence more will be the I/O. This is something to take notice of while creating Pivot Table reports.