Tracing Data Lineage and Impact Analysis in Oracle Enterprise Metadata Management 12c

At the beginning of the year, I wrote a couple of posts about Oracle Enterprise Metadata Management, one about how to install the tool and another one explaining the harvesting (importing) and stitching (relating) processes. A lot of things have happened since then, including a new version of OEMM released on May with some nice features as HTML 5 support, a workflow for business glossary and including the concept of groups in security.

OEMM is the Oracle solution for data governance and lifecycle management.  “Data governance (DG) refers to the overall management of the availability, usability, integrity, and security of the data employed in an enterprise”. So, the process of knowing how my data is related, the integrity of it, who has access to which part of the data and with which application, from where people can access the data, defining the stewards of the data, etc are part of data governance.

To refresh the main concepts, the first thing that you have to do when you start working with OEMM is the harvesting or importing process. We import the metadata creating one model for each different metadata source. After the successful importing process, we need to create a configuration to relate all the required models.

One of the main features of OEMM is to perform Data Flow Lineage and Impact Analysis. Why is this so important? Because the typical problem that we face in IT is to understand how all the company data is related, what will happen if I modify one column in a source schema,  or how I can find out where a particular piece of data come from. In this post I will talk about these process which are exactly the same as in the previous version.

I created an environment where I installed the latest release of OEMM (12.1.3.0.2), Oracle Database 12c that contains the source, staging and DW schemas and ODI 12c which is the ELT tool to load and transform the data into our DW schema. In addition I also have OBIEE 11g which repository is based on the DW schema and some OBI analyses to query the data. I imported all the metadata from these different sources and create a configuration that includes all the models created.

Data Impact Analysis

The process of Data Impact Analysis is to find out which data items in target database and/or applications are affected if we change one source object, like a column or a table or another source object.

There are many ways to start the tracing data impact process in OEMM. We can start the analysis impact process in OEMM from the model where the object belongs or from a diagram.

We can open the model from the repository panel or from the configuration Architecture diagram. Once the model is opened, select the object that you want to perform the impact analysis, right click on it, and select Trace Data Impact or Trace Lineage (Advanced) option. Either option opens the same dialog window.

Then, we need to choose between the model and the configuration as the scope for this impact analysis. The most common option is to select one configuration to see the consequences of a possible change in all the target models present in this configuration. Also we have the option to see the result in graphic or in text mode.

In our example, we are going to trace the data impact of the ORDERS table which is in the database source model and is the source for the fact table in the DW.

In the following picture, you will find the resulted diagram of the data impact analysis. In order to see the diagrams better in OEMM,  you can collapse the Properties windows at your right and press the Fit to Content button in the diagram menu.

There are plenty of interesting things that you can do from here. First, if you take a closer look you will see that the lines between the Orders Source and Order Staging as well as the ones from Orders Staging and Orders_DW are ticker than the ones that connect Orders_DW to the OBIEE Model. This means there are some ETL process in the middle of these connections. When we right click on one of these arrows and select Trace ETL details, a new tab is opened inside the configuration and will show the mappings involved and the operations (a JOIN in our example). Also if we expand the windows properties and select the JOIN element we can see its condition.

Coming back to the Data Impact diagram, you can select one particular object like a column and press the highlight path button to emphasise only a specific data flow. This is very useful when you have many objects in the diagram and you need to focus in some particular data path.

If you want to keep this diagram saved so you can come back easily to it, you can add a bookmark. You can create folders to organise your bookmarks and they will appear at the bottom left of the screen.

 

Tracing Data Lineage

The Trace Data Lineage process is to find the path from the source to the target object that you selected.

In OEMM, tracing data lineage is a very similar process to impact analysis. In fact, it uses the same dialog window but now you should select the Trace Data Lineage or Trace Lineage (Advanced) option. Again, either option opens the same dialog window.

In our example, we are going to use the Table View from a particular OBI Analysis. We will perform first the data lineage using the model as a scope and then run another one in the context of the configuration in order to make clear the differences between these two data lineages.

Once I have the OBIEE Model opened, I double-click the required analysis and the analysis with all its components (criteria and layout) will appear in the Metadata Browser. I will run first the data lineage based on the model.

Another tab is shown with the data lineage based on the model. You can select one measure for example and see the properties. It will show details as the expression or formula (if it is a calculated item) and the default aggregation. In the diagram appears three main objects (that can be collapsed or expanded as required), one that represents the physical layer in the OBI repository (rpd), another for the Presentation layer and finally the table view in the OBI Analysis.

As you probably realise by now, there are a lot of tabs and panels that appears as you open different objects and execute some tasks. The panels are collapsible and can be resized, so you will need to do this a lot in order to visualise what you want.

We are going to repeat the same process but now selecting the configuration that contains the OBIEE model. In this new diagram you will see where the data in the OBI Analysis is coming from and the relationship with the other models in the configuration including the ODI model and the different database schemas. You can expand or collapse the objects in the diagram, tracing ETL details, highlight the path and all the features that we've seen for the data impact diagram as it is the same diagram.

Another useful and new feature of this release is the quick find that allows you to search for specific words and show you a list of all the objects in the diagram that matches the search criteria. If you double-click in one of this results, the object will appear highlighted in the diagram.

Model Connection Overview

The Model Connection Overview diagram shows the connection of the objects inside a model. As its name suggests is just an overview of the connections. You cannot go deeper into the object details, but is useful to have a rough idea of the object relationship in a model.  In the next pictures you will see an example of he Model Connection diagram of the OBIEE model and the diagram for a particular mapping in the ODI model. You can also save it as a bookmark as the other diagrams.

And this is how you can perform data lineage and impact analysis in OEMM, analysing the relationships among your company data.