Productionising Machine Learning Models

Introduction:

Productionising or deploying a model is the process by which we make the model accessible to the end users in some way so that the predictions or insights can start generating value. This will often involve deploying a pipeline of data transformations or possibly moving some data transformations to an ETL tool.

Models can be deployed in a number of ways, and what is right for your use case will vary depending on a number of factors we’ll mention below. Generally, data scientists collaborate with data engineers or ML Ops engineers during the model deployment phase.

Considerations:

There are several things to consider when creating and productionising models (hopefully these will have been thought about before starting model development).

How often you require the predictions?

Depending on how often the input data for your model is being changed or updated, and how often you need the outputs from the model to be known. Your model can be deployed for batch consumption or real-time consumption.

For batch consumption, the model can be scheduled to run at predetermined times or intervals, and the predictions stored and reported upon with your other data. A common example would be to write data directly to a database and report upon it using the same analytics tool as the rest of your data.

For real-time consumption, a deployed model would need to be invoked and serve a prediction on a case-by-case basis. If you require real-time predictions you will also need to consider how quickly a model must respond, the size of the data, and the number of requests you expect your model to receive.

How will the predictions be consumed?

Who or what will ultimately be using the output from the model? Do the predictions need to be integrated into a platform or system your business already uses?

If the outputs have a larger latency before they need to be reported on, can they be written to any data store you already use and reported upon in the same way you analyse the rest of your data? For example, writing to a database and then viewing in an analytics platform.

If the predictions are required in real-time, the model could be deployed as a web API.

What data pipelines or transformations will also need to be deployed?

The data used in the model training will have gone through various data manipulation steps. While some of these might need to be replicated in an ETL tool, some could be model specific and it might make sense to implement them separately in specific tasks or steps. For example, applying encoding or normalisation in the same way and within a similar environment used for training the model.

There may also be model dependencies on various environments and packages which you would need to keep consistent between model training and deployment.

How will you monitor the model once deployed?

When productionising your model you will want to have some way to gauge if it's working in the way you expected. While a model could be deployed without any monitoring in place this is strongly discouraged. Aspects to consider include frameworks for logging, performance monitoring, and reporting model metrics over time.

Ideally, you should have mechanisms to detect variations in the data inputs for the model, and processes in place to monitor the performance of the model outputs over time. Generally, as time goes on models become less accurate since the data the model was trained on is more out of date, and the underlying trends in the data may shift over time. This can be identified as a drop in model accuracy or performance, and this is a sign that your model should be retrained on more up-to-date data. You might also need to think about a model feedback loop where actions taken off the back of your predictions might be affecting the apparent performance of the model.

If your model is deployed in real-time you should also consider monitoring how often it's being invoked and if the compute resources meet the demand.

How will you update the model?

Machine learning models will need to be updated to adapt to environment or input data variations over time and it's important to establish a process for updating and deploying new models.

In some use cases, it might be possible for models to be automatically or semi-automatically retrained on new data and deployed. This automation of machine learning lifecycle is referred to as MLOps.

Your models should always be checked or assessed before deployment, even if these checks themselves are automated, there should be a robust system to ensure any automatically trained models are making reasonable predictions.

Technology stacks:

While models can be deployed and productionised in a range of different ways, using open-source tools and your own infrastructure, there are now a range of technology stacks/ machine learning platforms that make productionising easier.

OCI Data Science

One of these is Oracle Cloud Infrastructure (OCI) Data Science Services which includes:

The Model Catalog - This is a managed repository of models, and associated documentation, including model provenance, introspection tests taxonomy, and input and output schema. These models can be shared throughout a team, and they can be loaded into, and used within, Data Science Notebook Sessions. Any associated Conda environment can also be specified or saved to an OCI Storage Bucket to be used with these models or their deployment.

Model Deployments - Models deployed on OCI Data Science (when they are active) can be accessed via a HTTP endpoint using API calls. You can also specify the compute resources, and conda environment for the deployed model to run on, a load balancer to distribute traffic to multiple VMs, and a logging service to capture detailed information around model requests. The deployed model will automatically capture various model deployment metrics.

When the model is in an active state it can be invoked to create predictions on input data by sending an HTTP request to the endpoints. The deployment sends an HTTP response containing the created predictions back. You could send these HTTP requests using the OCI CLI, OCI Python SDK or the Java SDK, and code examples are provided when your model is deployed.

Deactivating your model will stop the associated instances and the load balancer, making the HTTP endpoints unavailable. However, a deployment can be reactivated at any time and will use the same HTTP endpoint.

Tracked Model Deployment Metrics -

  • Number of HTTP requests.
  • Response result and status.
  • Latency of predict calls.
  • Provisioned and consumed bandwidth.
  • CPU Utilisation.
  • Memory Utilisation.

Data Science Jobs - The automation of the machine learning lifecycle can be facilitated using OCI Jobs and Pipelines. OCI Jobs are templates that can be used to enable custom tasks, these tasks can include executable code in Python, Bash/Shell, or a zip or compressed tar file containing an entire project written in Python or Java.

Data Science Pipelines - OCI Pipelines consist of multiple steps, which can be predefined OCI Jobs or Python, Bash, or Java scripts, these steps can be dependent on previous steps, and ensures you can create an entire workflow of a process from start to finish. These steps could include any, or all, of the machine learning lifecycle from data processing, to deployment, and could therefore be used to automate the training validation and deployment of models for MLOps.

To find out more about how Rittman Mead can help with productionising models or OCI Data Science Services contact us at info@rittmanmead.com.