Leveraging Custom Python Scripts in Oracle Analytics Server
This post illustrates how to leverage custom Python scripts in Oracle Analytics Server to give you greater control and flexibility over specific data processing needs.
Oracle Analytics Server has enabled users to invoke custom Python/R scripts since the end of 2017. Unfortunately, this feature is not yet widely adopted, probably because the official documentation shows only how to upload a custom script, while the details about enabling the feature and embedding the script in XML format are not provided.
In this post, I'm going to illustrate how to leverage custom Python scripts in Oracle Analytics Server to give you greater control and flexibility over specific data processing needs.
Enabling Custom Scripts
The feature to invoke custom scripts is disabled by default and Doc ID 2675894.1 on My Oracle Support explains how to enable it.
Copy the attached updateCustomScriptsProperty.py script to $ORACLE_HOME/bi/modules/oracle.bi.publicscripts/internal/wlst.
Then execute the script using the WebLogic Scripting Tool:
Linux:
$ORACLE_HOME/oracle_common/common/bin/wlst.sh $ORACLE_HOME/bi/modules/oracle.bi.publicscripts/internal/wlst/updateCustomScriptsProperty.py true $DOMAIN_HOME $ORACLE_HOME
Windows:
%ORACLE_HOME%\oracle_common\common\bin\wlst.cmd %ORACLE_HOME%\bi\modules\oracle.bi.publicscripts\internal\wlst\updateCustomScriptsProperty.py true %DOMAIN_HOME% %ORACLE_HOME%
Restart Oracle Analytics Server to enable the feature.
Installing Additional Python Packages
Oracle Analytics Server 6.4 relies on Python 3.5.2 (sigh) which is included out-of-the-box with several packages. You can find them all under $ORACLE_HOME/bi/modules/oracle.bi.dvml/lib/python3.5/site-packages.
Call me paranoid or over-cautious, but to me it makes sense not to play around with the out-of-the-box version put in place by the installation. To avoid this, if any additional packages are required, I choose to firstly install another copy of the same Python version (3.5.2) in another location on the server - this way, I know I can add to or make changes without possibly affecting any other standard functionality that uses the out-of-the-box version.
Installing Python 3.5.2 on Linux in 2022 could be a bit tricky since the Python official website does not host the installers for older versions, but only the source code.
First of all download the source code for Python 3.5.2.
$ wget https://www.python.org/ftp/python/3.5.2/Python-3.5.2.tgz
Now extract the downloaded package.
$ sudo tar xzf Python-3.5.2.tgz
Compile the source code on your system using altinstall.
$ cd Python-3.5.2
$ sudo ./configure
$ sudo make altinstall
Then install all required packages using pip. My example needs langdetect and in order to make it work correctly with Python 3.5.2 I decided to install an older version of it (1.0.7). You should always verify which versions of packages used in your code are compatible with Python 3.5.2 and install them explicitly, otherwise pip will automatically pick the latest ones (which may not be compatible).
$ sudo pip3.5 install langdetect==1.0.7
Edit the obis.properties file located under $DOMAIN_HOME/config/fmwconfig/bienv/OBIS, set the PYTHONPATH variable to ensure the packages can be found by Oracle Analytics Server, and restart the services.
PYTHONPATH=$ORACLE_HOME/bi/bifoundation/advanced_analytics/python_scripts:/usr/local/lib/python3.5/site-packages
export PYTHONPATH
Anatomy of a Custom Python Script
To be able to use a custom Python script with Oracle Analytics Server, we need to embed it in a simple pre-defined XML format.
The XML must contain one root element <script> that is the parent of all other elements:
<?xml version="1.0" encoding="UTF-8"?>
<script>
...
</script>
The <scriptname> element indicates the name of your script:
<scriptname>py.DetectLanguage</scriptname>
According to the documentation, <scriptlabel> should indicate the name of the script as visible for end users, but it seems to be ignored once the script has been uploaded to Oracle Analytics Server. However, if you don't include this element in the XML you will get an error notification while uploading the script.
<scriptlabel>Detect Language (py)</scriptlabel>
<target> refers to the type of script that you are embedding in the XML:
<target>python</target>
In order to use the script in data flows, it's mandatory to include the <type> element and set it to execute_script:
<type>execute_script</type>
<scriptdescription> is straightforward to understand and provides a description of the script as explained by its developer. You can also specify the version of the script in the <version> element.
<scriptdescription>
<![CDATA[
Determine the language of a piece of text.
]]>
</scriptdescription>
<version>v1</version>
The <outputs> element lists the outputs returned by the script. In the example, the script returns one column called language. <displayName> and <datatype> elements refer to the name displayed in the user interface and the data type of the outputs.
<outputs>
<column>
<name>language</name>
<displayName>language</displayName>
<datatype>varchar(100)</datatype>
</column>
</outputs>
The <options> element indicates the input parameters to the script. There is also a special parameter includeInputColumns which lets users choose whether to append output columns to the input dataset and return, or just return the output columns. In the example, the script requires one column input (text) and always append the output column (language) to the input dataset.
<options>
<option>
<name>text</name>
<displayName>Text</displayName>
<type>column</type>
<required>true</required>
<description>The input column for detecting the language</description>
<domain></domain>
<ui-config></ui-config>
</option>
<option>
<name>includeInputColumns</name>
<displayName>Include Input Columns</displayName>
<value>true</value>
<required>false</required>
<type>boolean</type>
<hidden>true</hidden>
<ui-config></ui-config>
</option>
</options>
And lastly, the <scriptcontent> element must contain the Python code. You have to import all required packages and implement your data transformation logic in the obi_execute_script function:
- The data parameter provides access to the input dataset in a Pandas DataFrame structure.
- The args parameter provides access to the input parameters as defined in the <options> XML element.
- The function must return a Pandas DataFrame structure.
- Oracle Analytics Server will automatically execute this function when you invoke the custom script.
<scriptcontent><![CDATA[
import pandas as pd
from langdetect import detect, DetectorFactory
def obi_execute_script(data, columnMetadata, args):
language_array = []
DetectorFactory.seed = 0
for value in data[args['Text']]:
language_array.append(detect(value))
data.insert(loc=0, column='language', value=language_array)
return data
]]></scriptcontent>
In the example above, the values in the input column are analyzed to detect their language using the langdetect package, the results are then collected into an array and returned alongside the input dataset. The source code is attached below, feel free to use it, but remember to install all required packages first!
Custom Python Scripts in Action
Once wrapped into XML, administrators can upload custom scripts into Oracle Analytics Server. Once uploaded they can be shared and executed by other users.
In the Home page, click on the Create button and select the Script option.
Drag and drop your custom script to the Add Script dialog, then click OK.
The uploaded custom script is now available for use and displayed in the Scripts tab on the Machine Learning page.
To invoke the script from a data flow you have to include the Add Custom Script step.
Select the script that you want to execute, and click OK.
Configure the step by specifying any required input and the outputs that you want to include into the result set. In the example, I chose to detect the language for the review column.
Save the output data by adding the Save Data step and run the data flow to execute the custom script.
Conclusion
Business analysts and end-users often want greater control when performing data preparation tasks. In this context, leveraging custom Python/R scripts into Oracle Analytics Server can give you full control and flexibility over specific data processing needs.
If you are looking into leveraging custom Python/R scripts into Oracle Analytics Server and want to find out more, please do get in touch or DM us on Twitter @rittmanmead. Rittman Mead can help you with a product demo, training and assist within the development process.