Metadata Modeling in the Database with Analytic Views
12.2, the latest Oracle database release provides a whole set of new features enhancing various aspects of the product including JSON support, Auto-List Partitioning and APEX news among others.
One of the biggest news in the Data Warehousing / Analytics area was the introduction of the Analytic Views, that as per Oracle's definition are
Metadata objects that enable the user to quickly and easily create complex hierarchical and dimensional queries on data in database tables and views
tl;dr
If you are on rush, here is an abstract of what you'll find in this looooong blog post:
Metadata modeling can now be done directly in the database using Analytic Views, providing to end users a way of querying database objects without needing a knowledge of joining conditions, aggregation functions or order by clauses.
This post will guide you through the creation of an analytic view that replicates a part of a OBIEE's Sampleapp business model. The latest part of the post is dedicated to understanding the usage of analytic views and the benefits for end users especially in cases when self-service BI tools are used.
If you are still around and interested in the topic please take a drink and sit comfortably, it will be a good journey.
Metadata Modeling
What are then the Analytics Views in detail? How are they going to improve end user's ability in querying data?
To answer above question I would take a step back. Many readers of this blog are familiar with OBIEE and its core: the Repository. The repository contains the metadata model from the physical sources till the presentation areas and includes the definition of:
- Joins between tables
- Hierarchies for dimensions
- Aggregation rules
- Security settings
- Data Filters
- Data Sources
This allows end users to just pick columns from a Subject Area and display them in the appropriate way without needing to worry about writing SQL or knowing how the data is stored. Moreover definitions are held centrally providing the famous unique source of truth across the entire enterprise.
The wave of self-service BI tools like Tableau or Oracle's Data Visualization Desktop provided products capable of querying almost any kind of data sources in a visual and intuitive way directly in the end user hands. An easy and direct access to data is a good thing for end user but, as stated above, requires knowledge of the data model, joins and aggregation methods.
The self-service tools can slightly simplify the process by providing some hints based on column names, types or values but the cruel reality is that the end-user has to build the necessary knowledge of the data source before providing correct results. This is why we've seen several times self-service BI tools being "attached" to OBIEE: get corporate official data from the unique source of truth and mash them up with information coming from external sources like personal Excel files or output of Big Data processes.
Analytics Views
Analytic Views (AV) take OBIEE's metadata modeling concept and move it at database level providing a way of organizing data in a dimensional model so it can be queried with simpler SQL statements.
The Analytical Views are standard views with the following extra options:
- Enable the definition of facts, dimensions and hierarchies that are included in system-generated columns
- Automatically aggregate the data based on pre-defined calculations
- Include presentation metadata
Analytics views are created with a CREATE ANALYTIC VIEW
statement, some privileges need to be granted to the creating user, you can find the full list in Oracle's documentation.
Every analytical view is composed by the following metadata objects:
- Attribute dimensions: organising table/view columns into attributes and levels.
- Hierarchies: defining hierarchical relationships on top of an attribute dimension object.
- Analytic view objects: defining fact data referencing both fact tables and hierarchies.
With all the above high level concepts in mind it's now time to try how Analytical Views could be used in a reporting environment.
Database Provisioning
For the purpose blog post I used Oracle's 12.2.0.1 database Docker image, provided by Gerald Venzl, the quickest way of spinning up a local instance. You just need to:
- Install Docker
- Download database installer from Oracle's website
- Place the installer in the proper location mentioned in the documentation
- Build Oracle Database 12.1.0.2 Enterprise Edition Docker image by executing
./buildDockerImage.sh -v 12.1.0.2 -e
- Running the image by executing
docker run --name db12c -p 1521:1521 -p 5500:5500 -e ORACLE_SID=orcl -e ORACLE_PDB=pdborcl -e ORACLE_CHARACTERSET=AL32UTF8 oracle/database:12.2.0.1-ee
The detailed parameters definition can be found in the GitHub repository. You can then connect via sqlplus to your local instance by executing the standard
sqlplus sys/pwd@//localhost:1521/pdborcl as sysdba
The password is generated automatically during the first run of the image and can be found in the logs, look for the following string
ORACLE AUTO GENERATED PASSWORD FOR SYS, SYSTEM AND PDBAMIN: XXXXxxxxXXX
Once the database is created it's time to set the goal: I'll try to recreate a piece of the Oracle's Sampleapp RPD model in the database using Analytic Views.
Model description
In this blog post I'll look in the 01 - Sample App
business model and specifically I'll try to replicate the logic behind Time, Product and the F0 Sales Base Measures
using Analytic Views.
Dim Product
The Sampleapp's D1 - Products (Level Based Hierarchy)
is based on two logical table sources: SAMP_PRODUCTS_D
providing product name, description, LOB and Brand and the SAMP_PROD_IMG_D
containing product images. For the purpose of this test we'll keep our focus on SAMP_PRODUCTS_D
only.
The physical mapping of Logical columns is shown in the image below.
Attribute Dimension
The first piece we're going to build is the attribute dimension, where we'll be defining attributes and levels. The mappings in above image can "easily" be translated into an attributes with the following SQL.
CREATE OR REPLACE ATTRIBUTE DIMENSION D1_DIM_PRODUCT
USING SAMP_PRODUCTS_D
ATTRIBUTES
(PROD_KEY as P0_Product_Number
CLASSIFICATION caption VALUE 'P0 Product Number',
PROD_DSC as P1_Product
CLASSIFICATION caption VALUE 'P1 Product',
TYPE as P2_Product_Type
CLASSIFICATION caption VALUE 'P2 Product Type',
TYPE_KEY as P2k_Product_Type
CLASSIFICATION caption VALUE 'P2k Product Type',
LOB as P3_LOB
CLASSIFICATION caption VALUE 'P3 LOB',
LOB_KEY as P3k_LOB
CLASSIFICATION caption VALUE 'P3k LOB',
BRAND as P4_Brand
CLASSIFICATION caption VALUE 'P4 Brand',
BRAND_KEY as P4k_Brand
CLASSIFICATION caption VALUE 'P4k Brand',
ATTRIBUTE_1 as P5_Attribute_1
CLASSIFICATION caption VALUE 'P5 Attribute 1',
ATTRIBUTE_2 as P6_Attribute_2
CLASSIFICATION caption VALUE 'P6 Attribute 2',
SEQUENCE as P7_Product_Sequence
CLASSIFICATION caption VALUE 'P7 Product Sequence',
TOTAL_VALUE as P99_Total_Value
CLASSIFICATION caption VALUE 'P99 Total Value')
Few pieces to note:
CREATE OR REPLACE ATTRIBUTE DIMENSION
: we are currently defining a dimension, the attributes and levels.USING SAMP_PRODUCTS_D
: defines the datasource, in our case the tableSAMP_PRODUCTS_D
. Only one datasource is allowed per dimension.PROD_KEY as P0_Product_Number
: using the standard notificationas
we can easily recaption columns namesCLASSIFICATION CAPTION ...
several options can be added for each attribute like caption or description
The dimension definition is not complete with only attribute declaration, we also need to define the levels. Those can be taken from OBIEE's hierarchy
For each level we can define:
- The level name, caption and description
- The Key
- the Member Name and Caption
- the Order by Clause
Translating above OBIEE's hierarchy levels into Oracle SQL
LEVEL BRAND
CLASSIFICATION caption VALUE 'BRAND'
CLASSIFICATION description VALUE 'Brand'
KEY P4k_Brand
MEMBER NAME P4_Brand
MEMBER CAPTION P4_Brand
ORDER BY P4_Brand
LEVEL Product_LOB
CLASSIFICATION caption VALUE 'LOB'
CLASSIFICATION description VALUE 'Lob'
KEY P3k_LOB
MEMBER NAME P3_LOB
MEMBER CAPTION P3_LOB
ORDER BY P3_LOB
DETERMINES(P4k_Brand)
LEVEL Product_Type
CLASSIFICATION caption VALUE 'Type'
CLASSIFICATION description VALUE 'Type'
KEY P2k_Product_Type
MEMBER NAME P2_Product_Type
MEMBER CAPTION P2_Product_Type
ORDER BY P2_Product_Type
DETERMINES(P3k_LOB,P4k_Brand)
LEVEL Product_Details
CLASSIFICATION caption VALUE 'Detail'
CLASSIFICATION description VALUE 'Detail'
KEY P0_Product_Number
MEMBER NAME P1_Product
MEMBER CAPTION P1_Product
ORDER BY P1_Product
DETERMINES(P2k_Product_Type,P3k_LOB,P4k_Brand)
ALL MEMBER NAME 'ALL PRODUCTS';
There is an additional DETERMINES
line in above sql for each level apart from Brand
, this is how we can specify the relationship between level keys. If we take the Product_LOB
example, the DETERMINES(P4k_Brand)
defines that any LOB in our table automatically determines a Brand (in OBIEE terms that LOB is a child of Brand).
Hierarchy
Next step is defining a hierarchy on top of the attribute dimension D1_PRODUCTS
defined above. We can create it just by specifying:
- the attribute dimension to use
- the list of levels and the relation between them
which in our case becomes
CREATE OR REPLACE HIERARCHY PRODUCT_HIER
CLASSIFICATION caption VALUE 'Products Hierarchy'
USING D1_DIM_PRODUCT
(Product_Details CHILD OF
Product_Type CHILD OF
Product_LOB CHILD OF
BRAND);
When looking into the hierarchy Product_hier
we can see that it's creating an OLAP-style dimension with a row for each member at each level of the hierarchy and extra fields like DEPT
, IS_LEAF
and HIER_ORDER
The columns contained in Product_hier
are:
- One for each Attribute defined in attribute dimension
D1_PRODUCTS
likeP0_PRODUCT_NUMBER
orP2K_PRODUCT_TYPE
- The member name, caption and description and unique name
- The level name in the hierarchy and related depth
- The relative order of the member in the hierarchy
- A field
IS_LEAF
flagging hierarchy endpoints - References to the parent level
Member Unique Names
A particularity to notice is that the MEMBER_UNIQUE_NAME
of Cell Phones
is [PRODUCT_TYPE].&[101]
which is the concatenation of the LEVEL
and the P2K_PRODUCT_TYPE
value.
One could expect the member unique name being represented as the concatenation of all the preceding hierarchy members, Brand and LOB, and the member key itself in a string like [PRODUCT_TYPE].&[10001]&[1001]&[101]
.
This is the default behaviour, however in our case is not happening since we set the DETERMINES(P3k_LOB,P4k_Brand)
in the attribute dimension definition. We Specified that Brand ([10001]
) and LOB ([1001]
) can automatically be inferred by the Product Type so there is no need to store those values in the member key. We can find the same setting in OBIEE's Product Type logical level
Dim Date
The basic D0 Dim Date
can be built starting from the table SAMP_TIME_DAY_D
following the same process as above. Like in OBIEE, some additional settings are required when creating a time dimension:
DIMENSION TYPE TIME
: the time dimension type need to be specifiedLEVEL TYPE <LEVEL_NAME>
: each level in the time hierarchy needs to belong to a precise level type chosen from:- YEARS
- HALF_YEARS
- QUARTERS
- MONTHS
- WEEKS
- DAYS
- HOURS
- MINUTES
- SECONDS
Attribute Dimension
Taking into consideration the additional settings, the Dim Date
column mappings in above image can be translated in the following attribute dimension SQL definition.
CREATE OR REPLACE ATTRIBUTE DIMENSION D0_DIM_DATE
DIMENSION TYPE TIME
USING SAMP_TIME_DAY_D
ATTRIBUTES
(CALENDAR_DATE AS TOO_CALENDAR_DATE,
PER_NAME_MONTH AS T02_PER_NAME_MONTH,
PER_NAME_QTR AS T03_PER_NAME_QTR,
PER_NAME_YEAR AS T04_PER_NAME_YEAR,
DAY_KEY AS T06_ROW_WID,
BEG_OF_MTH_WID AS T22_BEG_OF_MTH_WID,
BEG_OF_QTR_WID AS T23_BEG_OF_QTR_WID
)
LEVEL CAL_DAY
LEVEL TYPE DAYS
KEY TOO_CALENDAR_DATE
ORDER BY TOO_CALENDAR_DATE
DETERMINES(T22_BEG_OF_MTH_WID, T23_BEG_OF_QTR_WID,T04_PER_NAME_YEAR)
LEVEL CAL_MONTH
LEVEL TYPE MONTHS
KEY T22_BEG_OF_MTH_WID
MEMBER NAME T02_PER_NAME_MONTH
ORDER BY T22_BEG_OF_MTH_WID
DETERMINES(T23_BEG_OF_QTR_WID,T04_PER_NAME_YEAR)
LEVEL CAL_QUARTER
LEVEL TYPE QUARTERS
KEY T23_BEG_OF_QTR_WID
MEMBER NAME T03_PER_NAME_QTR
ORDER BY T23_BEG_OF_QTR_WID
DETERMINES(T04_PER_NAME_YEAR)
LEVEL CAL_YEAR
LEVEL TYPE YEARS
KEY T04_PER_NAME_YEAR
MEMBER NAME T04_PER_NAME_YEAR
ORDER BY T04_PER_NAME_YEAR
ALL MEMBER NAME 'ALL TIMES';
You may have noticed a different mapping of keys, member names and order by attributes. Let's take the CAL_MONTH
as example. It's defined by two columns
BEG_OF_MTH_WID
: used for joins and orderingPER_NAME_MONTH
: used as "display label"
PER_NAME_MONTH
in the YYYY / MM
format could be also used for ordering, but most of the times end user requests months in the MM / YYYY
format. Being able to set a ordering column different from the member name allows us to properly manage the hierarchy.
Hierarchy
Time hierarchy follows the same rules as the product one, no additional settings are required.
CREATE OR REPLACE HIERARCHY TIME_HIER
USING D0_DIM_DATE
(CAL_DAY CHILD OF
CAL_MONTH CHILD OF
CAL_QUARTER CHILD OF
CAL_YEAR);
Fact Sales
The last step in the journey is the definition of the analytic view of the fact table that as per Oracle's documentation
An analytic view specifies the source of its fact data and defines measures that describe calculations or other analytic operations to perform on the data. An analytic view also specifies the attribute dimensions and hierarchies that define the rows of the analytic view.
The analytic view definition contains the following specifications:
- The data source: the table or view that will be used for the calculation
- The columns: which columns from the source objects to use in the calculations
- The attribute dimensions and hierarchies: defining both the list of attributes and the levels of the analysis
- The measures: a set of aggregations based on the predefined columns from the data source.
Within analytical views definition a materialized view can be defined in order to store aggregated values. This is a similar to OBIEE's Logical Table Source setting for aggregates.
Analytic View Definition
For the purpose of the post I'll use SAMP_REVENUE_F
which is one of the sources of F0 Sales Base Measures
in Sampleapp. The following image shows the logical column mapping.
The above mappings can be translated in the following SQL
CREATE OR REPLACE ANALYTIC VIEW F0_SALES_BASE_MEASURES
USING SAMP_REVENUE_F
DIMENSION BY
(D0_DIM_DATE
KEY BILL_DAY_DT REFERENCES TOO_CALENDAR_DATE
HIERARCHIES (
TIME_HIER DEFAULT),
D1_DIM_PRODUCT
KEY PROD_KEY REFERENCES P0_Product_Number
HIERARCHIES (
PRODUCT_HIER DEFAULT)
)
MEASURES
(F1_REVENUE FACT REVENUE AGGREGATE BY SUM,
F10_VARIABLE_COST FACT COST_VARIABLE AGGREGATE BY SUM,
F11_FIXED_COST FACT COST_FIXED AGGREGATE BY SUM,
F2_BILLED_QTY FACT UNITS,
F3_DISCOUNT_AMOUNT FACT DISCNT_VALUE AGGREGATE BY SUM,
F4_AVG_REVENUE FACT REVENUE AGGREGATE BY AVG,
F21_REVENUE_AGO AS (LAG(F1_REVENUE) OVER (HIERARCHY TIME_HIER OFFSET 1))
)
DEFAULT MEASURE F1_REVENUE;
Some important parts need to be highlighted:
USING SAMP_REVENUE_F
: defines the analytic view source, in our case the tableSAMP_REVENUE_F
DIMENSION BY
: this section provides the list of dimensions and related hierarchies to take into accountKEY BILL_DAY_DT REFERENCES TOO_CALENDAR_DATE
: defines the join between the fact table and attribute dimensionHIERARCHIES (TIME_HIER DEFAULT)
: multiple hierarchies can be defined on top of an attribute dimension and used in an analytical view, however like in OBIEE only one will be used by defaultF1_REVENUE FACT REVENUE AGGREGATE BY SUM
: defines the measure with alias, source column and aggregation methodF2_BILLED_QTY FACT UNITS
: if aggregation method is not defined it replies on defaultSUM
F21_REVENUE_AGO
: new metrics can be calculated based on previously defined columns replicating OBIEE functions like time-series. The formula(LAG(F1_REVENUE) OVER (HIERARCHY TIME_HIER OFFSET 1))
calculates the equivalent of the OBIEE'sAGO
function for each level of the hierarchy.DEFAULT MEASURE F1_REVENUE
: defines the default measure of the analytic view
Using Analytic Views
After the analytic view definition, it's time to analyse what benefits end users have when using them. We are going to take a simple example: a query to return the Revenue and Billed Qty per Month and Brand.
Using only the original tables we would have the following SQL
SELECT D.CAL_MONTH,
D.BEG_OF_MTH_WID,
P.BRAND,
SUM(F.REVENUE) AS F01_REVENUE,
SUM(F.UNITS) AS F02_BILLED_QTY
FROM SAMP_REVENUE_F F
JOIN SAMP_PRODUCTS_D P
ON (F.PROD_KEY = P.PROD_KEY)
JOIN SAMP_TIME_DAY_D D
ON (F.BILL_DAY_DT = D.CALENDAR_DATE)
GROUP BY D.CAL_MONTH,
D.BEG_OF_MTH_WID,
P.BRAND
ORDER BY D.BEG_OF_MTH_WID,
P.BRAND;
The above SQL requires the knowledge of:
- Aggregation methods
- Joins
- Group by
- Ordering
Even if this is an oversimplification of the analytic view usage you can already spot that some knowledge of the base data structure and SQL language is needed.
Using the analytic views defined above, the query can be written as
SELECT TIME_HIER.MEMBER_NAME AS TIME_SLICE,
PRODUCT_HIER.MEMBER_NAME AS PRODUCT_SLICE,
F1_REVENUE,
F2_BILLED_QTY
FROM F0_SALES_BASE_MEASURES
WHERE TIME_HIER.LEVEL_NAME IN ('CAL_MONTH')
AND PRODUCT_HIER.LEVEL_NAME IN ('BRAND')
ORDER BY TIME_HIER.HIER_ORDER,
PRODUCT_HIER.HIER_ORDER;
As you can see, there is a simplification of the SQL statement: no more aggregation, joining conditions and group by predicates are needed. All the end-user has to know is the analytical view name, and the related hierarchies that can be used.
The additional benefit is that if we want to change the level of granularity of the above query we just need to change the WHERE
condition. E.g. to have the rollup per Year and LOB we just have to substitute
WHERE TIME_HIER.LEVEL_NAME IN ('CAL_MONTH')
AND PRODUCT_HIER.LEVEL_NAME IN ('BRAND')
with
WHERE TIME_HIER.LEVEL_NAME IN ('CAL_YEAR')
AND PRODUCT_HIER.LEVEL_NAME IN ('LOB')
without touching granularity, group by and order by statements.
Using Analytic Views in DVD
At the beginning of my blog post I wrote that Analytic Views could be useful when used in conjunction with self-service BI tools. Let's have a look at how the end user journey is simplified in the case of Oracle's Data Visualization Desktop.
Without AV the end-user had two options to source the data:
- Write the complex SQL statement with joining condition, group and order by clause in the SQL editor to retrieve data at the correct level with the related dimension
- Import the fact table and dimensions as separate datasources and join them together in DVD's project.
Both options require a SQL and joining conditions knowledge in order to being able to present correct data. Using Analytic Views the process is simplified. We just need to create a new source pointing to the database where the analytic views are sitting.
Next step is retrieve the necessary columns from the analytic view. Unfortunately analytic views are not visible from DVD object explorer (only standard table and views are shown)
We can however specify with a simple SQL statement all the informations we need like Time and Member Slice, the related levels and the order in hierarchy.
SELECT TIME_HIER.MEMBER_NAME AS TIME_SLICE,
PRODUCT_HIER.MEMBER_NAME AS PRODUCT_SLICE,
TIME_HIER.LEVEL_NAME AS TIME_LEVEL,
PRODUCT_HIER.LEVEL_NAME AS PRODUCT_LEVEL,
TIME_HIER.HIER_ORDER AS TIME_HIER_ORDER,
PRODUCT_HIER.HIER_ORDER AS PRODUCT_HIER_ORDER,
F1_REVENUE,
F2_BILLED_QTY
FROM F0_SALES_BASE_MEASURES
ORDER BY TIME_HIER.HIER_ORDER,
PRODUCT_HIER.HIER_ORDER;
You may have noted that I'm not specifying any WHERE
clause for level filtering: as end user I want to be able to retrieve all the necessary levels by just changing a filter in my DVD project. After including the above SQL in the datasource definition and amending the measure/attribute definition I can start playing with the analytic view data.
I can simply include the dimension's MEMBER_NAME
in the graphs together with the measures and add the LEVEL_NAME
in the filters. In this way I can change the graph granularity by simply selecting the appropriate LEVEL
in the filter selector for all the dimensions available.
One particular to notice however is that all the data coming from various columns like date, month and year are "condensed" into a single VARCHAR
column. In case of different datatypes (like date in the time dimension) this will prevent a correct usage of some DVD's capabilities like time series or trending functions. However if a particular type of graph is needed for a specific level, either an ad-hoc query or a casting operation can be used.
Conclusion
In this blog post we analysed the Analytic Views, a new component in Oracle Database 12.2 and how those can be used to "move" the metadata modeling at DB level to provide an easier query syntax to end-users.
Usually metadata modeling is done in reporting tools like OBIEE that offers additional set of features on top of the one included in analytic views. However centralized reporting tools like OBIEE are not present everywhere and, with the wave of self-service BI tools, analytic views represent a perfect method of enabling users not familiar with SQL to simply query their enterprise data.
If you are interested in understanding more about analytic views or metadata modeling, don't hesitate to contact us!
If you want to improve the SQL skills of your company workforce, check out our recently launched SQL for beginners training!