Master Azure MLOps: Efficient model management

In brief

This article describes the basics of machine learning operations (MLOps) and using these practices while handling machine learning (ML) models on the Azure cloud
Small fragments of the code are provided to share knowledge on uploading models onto the Azure workplace environment and deploying them in a cluster
An example of the lane detection use case is used to describe the relevant workflow developed as a set of steps

Abstract

This article outlines the architecture of the MLOps provided by Microsoft Azure and the separation of roles and responsibilities between a DevOps engineer and a data scientist. An example of a lane detection use case is provided to describe the relevant workflow developed as a set of steps.

Introduction

In one of our projects, we worked with evaluating different versions of trained machine learning (ML) models and monitoring their performance in real time (production-like environment) using some of the most popular visualization tools. In our end-to-end workflow, we utilized the practices of MLflow on an Azure cloud environment. We share an overview of the working knowledge in this article split across two parts. About the basics of MLOps and using these practices while handling ML models on the Azure cloud. Small fragments of the code are provided to share knowledge on uploading models onto the Azure workplace environment and deploying them in a cluster.

Once the models are put into production, monitoring the performance against new incoming data, and taking care of any drift in performance is equally important. In real time, looking into the generated scores onscreen and log files to make efficient decisions is quite tedious. However, some tools helped us monitor the health of our cluster and containerized applications with logs, metrics, and triggers on custom-built dashboards. These topics are studied using two popular open-source tools, Kibana and Grafana, and this subject forms part two of this article.

Use case: Lane detection

The requirement in our project was to have a lane detection model on which we could research the principles of MLOps. A lane detection algorithm tries to identify the yellow and white lane markings on the road, thereby assisting the driver in steering, and in providing autonomous driving. For this task, firstly, we explored various open-source datasets such as Caltech, BDD100K, CULane, TuSimple Lane, KITTI. Of all the datasets explored, BDD100K has one of the largest driving datasets with annotations. It has image frames captured in variable weather conditions and is useful for a wide range of computer vision problems.

As our project’s main goal wasn’t to develop a lane detection model as such, we started to look for open-source models that we could use in our research. From the models explored on the BDDK dataset, the Hybridnets model developed by Dat Vu and others, was most suitable for our task in terms of performance. The Hybridnets model has multiple objectives such as traffic object detection, drivable area segmentation and lane line detection. The model output was modified to serve only lane line identification.

Some of the metrics that were custom calculated were Intersection over Union (IoU) score, Accuracy, Precision, Recall and Inference time. The scores of these metrics were taken into consideration to evaluate the performance of the model.

MLOps

MLOps is a set of practices that aim to get the development of ML models faster into production. A MLOps cycle consists of various stages belonging to machine learning activities followed by operations from software engineering. For example, a MLOps cycle could include data preparation, feature engineering, model training and evaluation on the machine learning part while the packaging of model, deployment, and monitoring on the operations part. Therefore, it is usual to see data scientists and DevOps engineers working together in collaborative projects.

The advantage of using MLOps principles is that it facilitates continuous Integration (CI) of the development code, continuous delivery (CD) into production as well as continuous testing (CT). Unlike the stages in DevOps which involve only CI and CD, the CT stage is equally important in the MLOps lifecycle because the performance of the model also deteriorates over time.

Understanding MLOps vs DevOps  [Image Source]

MLOps on Azure

The figure below depicts the architecture of the MLOps framework provided by Microsoft. It includes all three stages of continuous integration (CI), continuous delivery (CD) and continuous training (CT). These pipelines include the activities carried out by DevOps engineers and data scientists. For example, in a project the DevOps engineer could provision the workspace for machine learning tasks and help in model deployment and monitoring postproduction. Data scientists would carry out the tasks of feature engineering, model training and evaluation, etc.

Architecture of the MLOps framework in Azure [Image by Microsoft in Source]

To get started with ML on Azure, the first step would be to create a workspace. Azure machine learning workspace acts as a common umbrella where data and computing clusters can be accessed, models can be trained, metrics logged and much more. As per our project’s requirements, we worked on Azure workplace to infer performance from different versions (trained on different dimensions of data) of a pretrained model by running experiments. The model was deployed to Azure Kubernetes cluster (AKS). Below is the set of steps that detail the use of the principles of MLOps for efficient model deployment and evaluation.

The data in AzureML can be accessed from external cloud sources such as a data lake. In our experiments, the data and model were accessed locally with Azure command line interface (CLI). The Azure CLI along with Azure Python SDK helps in connecting the local artifacts with resources on the Azure cloud. A workspace on Azure is associated with a subscription ID, resource name and workspace name. These details are present in a config file that is made available immediately after workspace resource creation. The config file is then used by Azure CLI to establish a connection.

from azureml.core.model import Model 
 
model1 = Model.register(workspace = ws,    # Workspace credentials loaded from a config file 
                       model_path = model_dir + "/" + "lane_detection.onnx", # Local model path 
                       model_name = "Lane_detection_model_1", 
                       tags = {"Model1": "V1"}, 
                       description = "First model version")

Azure provides support for models of various formats to be registered such as Sklearn, Pytorch, Tensorflow, etc. ONNX is an intermediary representation of ML model formats which are favorable while working in a cloud environment — the advantage being low response time and interoperability between different programming languages and frameworks. Official packages for converting Pytorch/Tensorflow models to ONNX are available and can be created with a few lines of code. On AzureML workspace, the registered data and models (code to register above) can be viewed on their respective tabs on the left panel. The figure below gives a view of the registered models on the AzureML workspace. When models are retrained and registered again, the version number auto-increments.

Overview of deployed models

2. Package model and dependencies

Once a model is registered with the workspace, it can be packaged into a container for deployment. The model would then be exposed as a REST API endpoint and can interact with Json requests. The packaging of a model needs a model dependencies file and an entry script. The entry script, also known as scoring script, contains two functions init() and run(). The init method contains the code to initialize the model from the Azure directory where it was registered, while the run method could accept Json requests, make calls for pre-processing, models prediction, and does postprocessing such as metric calculation and sends back the response. Below are the code fragments for creating an inference configuration with the packaged dependencies and scoring file.

from azureml.core.conda_dependencies import CondaDependencies  
 
myenv = CondaDependencies.create(pip_packages=["numpy","opencv-python-headless", 
        "onnxruntime","azureml-core", "azureml-defaults"]) 
 
with open("myenv.yml","w") as f: 
    f.write(myenv.serialize_to_string()) 

from azureml.core.model import InferenceConfig 
from azureml.core.environment import Environment 
 
 
myenv = Environment.from_conda_specification(name="myenv", file_path="myenv.yml") 
inference_config1 = InferenceConfig(entry_script="score1.py", environment=myenv)

3. Deploy the model

The contained model can be deployed either on an Azure container instance (ACI) or an Azure Kubernetes instance (AKS). The Azure ACI instance is ideal for small scale and lightweight container deployment for test or automation jobs. There’s no need to create a dedicated virtual machine cluster, and this also saves costs for resource utilization.

However, when a dedicated environment is needed to deploy large containers, handle networking, autoscaling and manage their services, the AKS service should be the appropriate choice. The number of worker nodes where the service should be deployed would have to be specified and AKS service would take care of the orchestration. Here the charges incurred would be per worker node that is added to the cluster. The code fragment below shows how a service endpoint can be created using the registered model and inference configuration from the previous step.

from azureml.core.webservice import AksWebservice, Webservice 
from azureml.core.model import Model 
from azureml.core.compute import AksCompute 
aks_target = AksCompute(ws,"Lane-detection") 
deployment_config = AksWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1) 
service1 = Model.deploy(ws, "lane-detection-service-1", [model1],  
                        inference_config1, deployment_config, aks_target) 
service1.wait_for_deployment(show_output = True)

Overview of containerized services deployed in an Azure Kubernetes instance

When using either of the two services — ACI or AKS — the interaction is with a REST endpoint. It’s advisable to delete these services when they are not in use to avoid incurring additional charges.

4. Make requests and log metrics

The performance of the model can be inferred by sending requests to the deployed service endpoint. In the workspace, under the Jobs tab, the results of the experiments can be traced. The experiments carried out during the training phase would also be recorded in this section. A set of runs are grouped under a common experiment. An example of a run could be validating a model for a set of hyperparameters.

The tracking of metrics, logs, artifacts and hyperparameters is necessary during the model training as well as the validation phase. Tools such as MLflow help in managing the end-to-end lifecycle of the experiments.

import time 
exp_name = "Lane_detection_prod"  
mlflow.set_experiment(exp_name) 
i=1 
run_name = 'First Model Run' 
with mlflow.start_run(run_name=run_name): 
    for name in image_file_names: 
        print("Processing image:",i,"of",len(image_file_names)) 
        test_image=cv2.imread(image_path+'\\'+name+'.jpg') 
        ground_truth_image=cv2.imread(labels_path+'\\'+name+'.png',cv2.IMREAD_GRAYSCALE) 
        test_data=json.dumps({'image':test_image.tolist(), 
                              'output':ground_truth_image.tolist()}) 
        response=service1.run(test_data) 
        seg_map=response['segmentation_map'] 
        seg_map=np.array(seg_map) 
        seg_map[seg_map==1]=0 
        segmented_image=draw_seg_image(seg_map,test_image) 
        Iou=response['Iou'] 
        accuracy=response['Accuracy'] 
        precision=response['Precision'] 
        recall=response['Recall'] 
        inference_time=response['Inference_time'] 
        mlflow.log_metric("Iou",Iou) 
        mlflow.log_metric(key="Accuracy", value=accuracy) 
        mlflow.log_metric(key="Precison", value=precision) 
        mlflow.log_metric(key="Recall", value=recall) 
        mlflow.log_metric(key="Inference_time", value=inference_time) 
        image_name="Output_image_"+str(i)+".png" 
        print(image_name) 
        mlflow.log_image(segmented_image,image_name) 
        i+=1 
mlflow.end_run()

The run function in our scoring script receives a Json request containing the test image and its ground truth. With the available ground truth and the predicted segmentation map, some metrics are computed. In the code fragment above for creating a run, you notice the image being encoded into a Json request and sent to the deployed service as a parameter call. The computed segmentation map and metrics are received back in response. These are logged under a run ID using MLflow. It is also possible to log artifacts such as files/images under the same run ID along with metrics.

The image below showcases an example of logged metrics and outputs under a single run.

Logging metrics in a single run

On the Jobs panel, it is possible to create and download charts to infer performance between the runs using the metrics logged with MLflow.

Overview of all runs in an experiment

Monitoring application health

As mentioned in the introduction of this article we had two primary targets, the second of which was how to monitor the health of the project as it ran. This section delves into that very subject in more detail.

We deployed models into an Azure Kubernetes cluster and were able to extrapolate the performance of our models on the Jobs dashboard. The health of our Kubernetes cluster can be monitored on the Azure monitor available on the Azure portal in terms of CPU, memory loads, etc. However, the functionalities are suitable for small-scaled applications. As closely inferring our deployments in the cluster is important, we can take advantage of open-source tools available for this purpose. The two most popular tools for monitoring logs and metrics are Kibana and Grafana. With these tools, it is possible to have some ready-to-use metrics along with custom-calculated ones on a nice interactive dashboard.

In our project we used these tools to monitor the logs and metrics of our three differently trained models. The subsections below describe the functionalities of each of them.

Grafana

Grafana is a web-based visualization tool that helps monitor metrics on a custom-built dashboard. Grafana supports various data sources to receive data, such as PostgreSQL, Influx dB, Prometheus, etc. A developer working on these tools might need the knowledge of writing queries in a language supported by these data sources. For metric visualization, Grafana supports commonly used chart types such as bar, line, heatmap, pie, etc.

In our lane detection project, we used Prometheus and PostgreSQL data sources in two different combinations. Prometheus is a software application that scrapes metrics at regular intervals from each of the worker nodes in the Kubernetes cluster. It works based on a HTTP pull model and then stores data in its local time-series database. To setup a combination of Grafana and Prometheus, we created a namespace in the cluster and used helm charts to install Grafana and Prometheus. Grafana could then be interacted in our local browser by port-forwarding the Prometheus-Grafana service. Use the link in the references for installation.

With the help of this setup, we could monitor the health of our Kubernetes cluster. All our deployments, pods and services running inside the cluster could be visualized. These include resource metrics such as CPU utilization, memory, and disk utilization, etc., as well as system metrics in the form of Request throughput, Error rate, Request latency, etc. The figure below illustrates this. These metrics are computed by Prometheus and can be picked from “Dashboards -> Kubernetes -> Compute Resources”.

System metrics on Grafana

The performance of our lane detection model couldn’t be captured by Prometheus as it had no clue about our Python application running inside the pods. To make it possible with Prometheus, our metrics had to be manually sent to the Prometheus server. This requires instrumentation to the code with the help of the Prometheus-client to implement understandable Prometheus metric types. For this task we used an alternative approach by designing a custom metric exporter for Azure and using a PostgreSQL database. Our metric exporter service uses a service principle to connect with Azure ML-workspace and extract the MLflow metrics that were previously created. The exporter service pings the Azure resource within a small time interval and sends the data to a PostgreSQL database. The exporter and database services were wrapped in two different docker containers and the architecture used is illustrated below.

Architecture overview for visualizing custom metrics on Grafana

We had the docker containers running locally, but for a production like environment, we can have these containers running on the cluster, i.e., an Azure Kubernetes cluster. On setting up the PostgreSQL as data source on Grafana, we were then able to visualize the same metrics on Azure ML workspace but here on Grafana with more interactions.

Visualizing custom machine learning metrics on Grafana

Kibana

Kibana is a visualization tool for monitoring logs and metrics. It is a part of the ELK stack (Elastic-Logstash-Kibana) owned by Elastic NV. To install Kibana on a Kubernetes cluster, the entire ELK stack has to be set up. The Elastic search database, unlike Grafana, is the only data source that Kibana supports. On the other hand, Elastic search can also be used as a data source for Grafana. Therefore, for Kibana, all logs must be first shipped to an Elasticsearch database before creating any visualization. When the data gets stored in Elastic search, it can be searched, indexed and queried. The language used by Kibana for querying is called Kibana Query Language (KQL). The ELK stack architecture is illustrated below.

Architecture of ELK stack (Source: guru99)

Filebeat, represented by beats in the architecture above, collects the logs from all the nodes in the cluster and is responsible for sending it to Elasticsearch. These logs can be modified, altered, or parsed by using Logstash which is responsible for transforming and editing in the form of pipeline operations. In our project we used Logstash to make Elasticsearch understand and store our custom metric type and value.

Once the metrics were calculated, we first ingested the metric type and value to logs in the run function of the scoring script. Later we used grok patterns (see references for more) to map the metric type and metric scores present in the logs into new fields for elastic search. The grok filter was added to our values.yaml file of Logstash (code snippet below). The transformed output was received by Elasticsearch.

logstashPipeline: 
  logstash.conf: | 
    input { 
      beats { 
        port => 5044 
      } 
    } 
    filter { 
      grok { 
        match => { "message" => "%{WORD:metrictype} : %{NUMBER:metricscores:float} } 
      } 
    } 
    output { elasticsearch { hosts => "http://elasticsearch-master:9200" } }

With this setup, Kibana dashboards could now be created for live visualization of metrics and other logs. To get started with Kibana, first create an index management by reaching “Stack Management -> Index Patterns -> Create index” page and specify the source as “logstash-*”/ “filebeat-*”. The live streaming logs can now be observed in “Analytics -> Discover” page. To create a dashboard, follow the illustration below.

Step 1 — Reach the Create dashboard link using the Analytics panel on the left

Step 2 — Selecting different elements of a chart

The “metric type” and “metric score” fields created with Logstash were now reachable from the filters panel shown in Step 2. Our metrics dashboard developed in Kibana is as shown below. The MLflow metrics that were seen on the Azure dashboard could now be observed on Kibana. A dashboard of logs created using the filters and chart elements is also illustrated below.

Dashboard of metric comparison of different deployed models

Dashboard of streaming performance logs in Kibana

In a nutshell, both Kibana and Grafana are powerful tools for visualization. Grafana is more suitable for visualizing metrics while Kibana is for the logs, although both tools can be used for both purposes. The creation of alerts and user management is also supported in both. As Kibana has Elasticsearch as its only data source, the choice can be shifted over to Grafana if the data source is of importance. With respect to cluster deployments, in our experience it was easier while working on Kibana than Grafana. Kibana also offers a wide range of chart types with simple edits and query functionalities.

Conclusion

This article outlined the architecture of the MLOps provided by Microsoft Azure and the separation of roles and responsibilities between a DevOps engineer and a data scientist. Utilizing the use case of lane detection, the workflow for registering models with Azure cloud and deploying them to a cluster after containerization was described as a set of steps. We introduced two popular open-source tools for the monitoring of machine learning deployments. The architecture of these frameworks was outlined. Based on the project requirements and database technology, the choice of tools can be made.

Arthur Shaikhatarov

AI/ML Project Manager, Automotive, Zoreza Global

Arthur has served as project manager specializing in AI/ML initiatives at Zoreza Global for 4 years. His journey in artificial intelligence and machine learning began as an engineer, where he honed his skills and knowledge. Arthur is responsible for team management and formation, presales activities for ML projects, the orchestration and execution of AI/ML proofs-of-concept and the growth of AI/ML activities within Zoreza Global.

Suraj Manjunatha

Data scientist and ML engineer

Suraj’s proficiency extends across a spectrum of domains within data science, including diverse data processing algorithms, implementation of computer vision techniques, and a deep understanding of Natural Language Processing algorithms. His extensive experience in the realm of MLOps equips his team to efficiently and effectively undertake even the most intricate AI/ML projects and Proof of Concepts (PoCs).

Arthur Shaikhatarov

AI/ML Project Manager, Automotive, Zoreza Global

Suraj Manjunatha

Data scientist and ML engineer