Spatial Analytics Made Easy: Oracle Spatial Studio
Let's say we need to understand where our company needs to open a new shop. Most of the time the decision is driven by gut feeling and some knowledge of the market and client base, but what if we could have visual insights about where are the high density zones with customers not covered by a shop nearby like in the map below?
Well... welcome Oracle Spatial Studio!
Spatial Studio is Oracle's new tool for creating spatial analytics with a visual GUI. It uses Oracle Spatial database functions in the backen d exposed with an interface in line with the Oracle Analytics Cloud one. Let's see how it works!
QuickStart Installation
First of all we need to download Spatial Studio from the Oracle web page, for this initial test I downloaded the "Quick Start", a self contained version pre-deployed in a lightweight application server. For more robust applications you may want to download the EAR file deployable in Weblogic.
Once downloaded and unzipped the file, we just need to verify we have a Java JDK 8 (update 181 or higher) under the hood and we can immediately start Oracle Spatial Studio with the ./start.sh
command.
The command will start the service on the local machine that can be accessed at https://localhost:4040/spatialstudio
. By default Oracle Spatial Studio Quickstart uses HTTPS protocol with self-signed certificates, thus the first time you access the URL you will need to add a security exception in your browser. The configurations such as port, JVM parameters, host and HTTP/HTTPS protocol can be changed in the conf/server.json
file.
We can then login with the default credentials admin/welcome1
The first step in the Spatial Studio setup is the definition of the metadata connection type. This needs to point to an Oracle database with the spatial option. For my example I initially used an Oracle Autonomous Data Warehouse, for which I had to drop the wallet and specify the schema details.
Once logged in, the layout and working flows are very similar to Oracle Analytics Cloud making the transition between the two very easy (more details on this later on). In the left menu we can access, like in OAC, Projects (visualizations), Data, Jobs and the Console.
In order to do Spatial Analysis we need to start from a Dataset, this can be existing tables or views, or we can upload local files. To create a Dataset, click on Create and Dataset
We have then three options:
- Upload a Spreadsheet containing spatial information (e.g. Addresses, Postcodes, Regions, Cities etc)
- Upload a Shapefile containing geometric locations and associated attributes.
- Use spatial data from one of the existing connections, this can point to any connection containing spatial information (e.g. a table in a database containing customer addresses)
Sample Dataset with Mockaroo
I used Mockaroo, a realistic data generator service, to create two excel files: one containing customers with related locations and a second one with shops and related latitude and longitude. All I had to do was to select which fields I wanted to include in my file and the related datatype.
For example, the list of shop dataset contained the following columns:
- Id: as row number
- Shop Name: as concatenation of Shop and the Id
- Lat: Latitude
- Long: Longitude
- Dept: the Department (e.g. Grocery, Books, Health&Beauty)
Mockaroo offers a perfect service and has a free tier of datasets with less than 1000 rows which can be useful for demo purposes. For each column defined, you can select between a good variety of column types. You can also define your own type using regular expressions!
Adding the Datasets to Oracle Spatial Studio
Once we have the two datasources in Excel format, it's time to start playing with Spatial Studio. We first need to upload the datasets, we can do it via Create and Dataset. Starting with the Customer.xlsx
one. Once selected the file to upload Spatial Studio provides (as OAC) an overview of the dataset together with options to change configurations like dataset name, target destination (metadata database) and column names.
Once modified the table name to TEST_CUSTOMERS
and clicked on Submit Spatial Studio starts inserting all the rows into the SPATIAL_STUDIO
connection with a routine that could take seconds or minutes depending on the dataset volume. When the upload routine finishes I can see the TEST_CUSTOMERS
table appearing in the list of datasets.
We can immediately see the yellow warning sign next to the dataset name, it's due to the fact that we have a dataset with no geo-coded information, we can solve this problem by clicking on the option button and then Prepare and Geocode Addresses
Oracle Spatial Studio will suggest, based on the column content, some geo-type matching e.g. City Name, Country and Postal Code. We can use the defaults or modify them if we feel they are wrong.
Once clicked on Apply the geocoding job starts.
Once the job ends, we can see the location icon next to our dataset name
We can do the same for the Shops.xlsx
dataset, starting by uploading it and store it as TEST_SHOPS
dataset.
Once the dataset is uploaded I can geo-locate the information based on the Latitude
and Longitude
, I can click on the option button and the selecting Prepare and Create Lon/Lat Index. Then I'll need to assign the Longitude
and Latitude
column correctly and click on Ok.
Spatial Analytics
Now it's time to do some Spatial Analysis so I can click on Create and Project and I'll face an empty canvas by default
The first step is to add a Map, I can do that by selecting the visualizations menu and then dragging the map to the canvas.
Next step is to add some data by clicking on Data Elements and then Add Dataset
I select the TEST_CUSTOMERS
dataset and add it to the project, then I need to drag it on top of the map to visualize my customer data.
Oracle Spatial Studio Offers several options to change the data visualizations like color, opacity, blur etc.
Now I can add the TEST_SHOPS
dataset and visualize it on the map with the same set of steps followed before.
It's finally time for spatial analysis! Let's say, as per initial example, that I want to know which of my customers doesn't have any shops in the nearest 200km. In order to achieve that I need to first create buffer areas of 200km around the shops, by selecting the TEST_SHOPS
datasource and then clicking on the Spatial Analysis.
This will open a popup window listing a good number of spatial analysis, by clicking on the Transform tab I can see the Add a buffer of a specified distance option.
Unfortunately the buffer function is not available in ADW at the moment.
I had to rely on an Oracle Database Cloud Service 18c Enterprise Edition - High Performance (which includes the Spatial option) to continue for my metadata storage and processing. Few Takeaways:
- Select 18c (or anything above 12.2): I hit an issue
ORA-00972: identifier is too long
when importing the data in a 12.1 Database, which (thanks StackOverflow) is fixed as of 12.2. - High Performance: This includes the Spatial Option
Once I used the DBCS as metadata store, I can finally use the buffer function and set the parameter of 200km around the shops.
The TEST_SHOPS_BUFFER
is now visible under Analysis and can be added on top of the Map correctly showing the 200km buffer zone.
I can understand which customers have a shop in the nearest 200k by creating an analysis and select the option "Return shapes within a specified distance of another"
In the parameters I can select the TEST_CUSTOMERS
as Layer to be filtered, the TEST_SHOPS
as the Layer to be used as filter and the 200Km
as distance.
I can then visualize the result by adding the TEST_CUSTOMERS_WITHIN_DISTANCE
layer in the map.
TEST_CUSTOMERS_WITHIN_DISTANCE
contains the customers already "covered" by a shop in the 200km range, what I may want to do now is remove them from my list of customers in order to do analysis on the remaining ones, how can I do that? Unfortunately in the first Spatial Studio version there is no visual way of doing DATASET_A MINUS DATASET_B
but, hey, it's just the first incarnation and we can expect that type of functions and many others to be available in future releases!
The following paragraph is an in-depth analysis in the database of functions that will probably be exposed in Spatial Studio's future version, so if not interested, progress directly to the section named "Progressing in the Spatial Analysis".
A Look in the Database
Since we want to achieve our goal of getting the customers not covered by a shop now, we need to look a bit deeper where the data is stored: in the database. This gives us two opportunities: check how Spatial Studio works under the covers and freely use SQL to achieve our goals (DATASET_A MINUS DATASET_B
).
First let's have a look at the tables created by Spatial Studio: we can see some metadata tables used by studio as well as the database representation of our two excel files TEST_CUSTOMERS
and TEST_SHOPS
.
Looking in depth at the metadata we can also see a table named SGTECH$TABLE
followed by an ID
. That table collects the information regarding the geo-coding job we executed against our customers dataset which were located starting from zip-codes and addresses. We can associate the table to the TEST_CUSTOMERS
dataset with the following query against the SGTECH_OBJECTS
metadata table.
SELECT NAME,
JSON_VALUE(data, '$.gcHelperTableName') DATASET
FROM SGTECH_OBJECT
WHERE OBJECTTYPE='dataset'
AND NAME='TEST_CUSTOMERS';
The SGTECH$TABLEA004AA549110B928755FC05F01A3EF89
table contains, as expected, a row for each customer in the dataset, together with the related geometry if the geo-coding was successful and some metadata flags like GC_ATTEMPTED
, GC_STATUS
and GC_MATCH_CODE
stating the accuracy of the geo-coding match.
What about all the analysis like the buffer and the customers within distance? For each analysis Spatial Studio creates a separate view with the SGTECH$VIEW
prefix followed by an ID
.
To understand which view is referring to which analysis we need to query the metadata table SGTECH_OBJECTS
with a query like
SELECT NAME,
JSON_VALUE(data, '$.tableName') DATASET
FROM SGTECH_OBJECT
WHERE OBJECTTYPE='dataset'
With the following result
We know then that the TEST_CUSTOMERS_WITHIN_DISTANCE
can be accessed by the view SGTECH$VIEW0B2B36785A28843F74B58B3CCF1C51E3
and when checking its SQL we can clearly see that it executes the SDO_WITHIN_DISTANCE
function using the TEST_CUSTOMERS.GC_GEOMETRY
, the TEST_SHOPS
columns LONGITUDE
and LATITUDE
and the distance=200 unit=KILOMETER
parameters we set in the front-end.
CREATE OR replace force editionable view "SPATIAL_STUDIO"."SGTECH$VIEW0B2B36785A28843F74B58B3CCF1C51E3"
SELECT
...
FROM
"TEST_CUSTOMERS" "t1",
"TEST_SHOPS" "t2"
WHERE
sdo_within_distance("t1"."GC_GEOMETRY",
spatial_studio.sgtech_ptf("t2"."LONGITUDE", "t2"."LATITUDE"),
'distance=200 unit=KILOMETER'
) = 'TRUE';
Ok, we now understood which view contains the data, thus we can create a new view containing only the customers which are not within the 200km distance with
CREATE VIEW TEST_CUSTOMERS_NOT_WITHIN_DISTANCE AS
SELECT
t1.id AS id,
t1.first_name AS first_name,
t1.last_name AS last_name,
t1.email AS email,
t1.gender AS gender,
t1.postal_code AS postal_code,
t1.street AS street,
t1.country AS COUNTRY,
t1.city AS city,
t1.studio_id AS studio_id,
t1.gc_geometry AS gc_geometry
FROM
test_customers t1
WHERE
id NOT IN (
SELECT
id
FROM
spatial_studio.sgtech$view0b2b36785a28843f74b58b3ccf1c51e3
);
Progressing in the Spatial Analysis
In the previous paragraph we created a view in the database named TEST_CUSTOMERS_NOT_WITHIN_DISTANCE
containing the customer without a shop in a 200km radius. We can now import it into Spatial Studio by creating a new dataset, selecting the connection to the database (in our case named SPATIAL_STUDIO
) as source and then the newly created TEST_CUSTOMERS_NOT_WITHIN_DISTANCE
view.
The dataset is added, but it has a yellow warning icon next to it
Spatial Studio requests us to define a primary key, we can do that by accessing the properties of the dataset, select the Columns tab, choosing which column acts as primary key and validate it. After this step I can visualize this customer in a map.
What's next? Well If I want to open a new shop, I may want to do that where there is a concentration of customers, which is easily visualizable with Spatial Studio by changing the Render Style
to Heatmap
.
With the following output
We can clearly see some major concentrations around Dallas, Washington and Minneapolis. Focusing more on Dallas, Spatial Studio also offers the option to simulate a new shop in the map and calculate the 200km buffer around it. I can clearly see that adding a shop halfway between Oklahoma City and Dallas would allow me to cover both clients within the 200km radius.
Please remember that this is a purely demonstrative analysis, and some of the choices, like the 200km buffer are expressly simplistic. Other factors could come into play when choosing a shop location like the revenue generated by some customers. And here it comes the second beauty of Oracle Spatial Studio, we can export datasets as GeoJSON or CSV and include them in Data Visualization.
For example I can export the data of TEST_CUSTOMERS_NOT_WITHIN_DISTANCE
from Spatial Studio and include then in a Data Visualization Project blending them with the Sales related to the same customers.
I can now focus not only on the customer's position but also on other metrics like Profit
or Sales Amount
that I may have in other datasets. For another example of Oracle Spatial Studio and Data Visualization interoperability check out this video from Oracle Analytics Senior Director Philippe Lions.
Conclusions
Spatial analytics made easy: this is the focus of Oracle Spatial Studio. Before spatial queries were locked down at database level with limited access from an analyst point of view. Now we have a visual tool with a simple GUI (in line with OAC) that easily enables spatial queries for everybody!
But this is only the first part of the story: the combination of capabilities achievable when mixing Oracle Spatial Studio and Oracle Analytics Cloud takes any type of analytics to the next level!