Dinesh Asanka

AutoML in Azure Machine Learning

August 10, 2021 by

Introduction

In this article, we are going to discuss AutoML in Azure Machine Learning Service. During our previous discussions, we had discussed features of Azure Machine Learning (Classic) and in this article, we are moving our discussion to Azure Machine Learning service.

As you remember, to utilize features of Azure Machine Learning (Classic), you do not need an Azure Subscription. However, for the Azure Machine Learning service, you need to have Azure Subscription. To enable the Azure Machine learning service, you need to add this service from the Azure Portal as shown below.

Creating Azure Machine Learning Service

Once the Azure Machine Learning services is created by providing the resource group’s details and other relevant details, you can launch the Azure Machine Learning services from Go to Resource option and Launch option.

After those options were selected, you will be taken into the following screen.

Azure Machine Learning Studio screen,

Before moving into the Azure Machine Learning service, let us look at the details of AutoML in details.

What is AutoML

If you remember, we have discussed different techniques that would be used to solve business problems such as Regression analysis, Classification Analysis, Clustering, Recommender Systems and Anomaly detection of Time Series in Azure Machine Learning in previous articles. However, things were not simply using any techniques as we had to perform a lot of other activities such as basic cleaning techniques, feature selection techniques, Principal component analysis, Comparing Models and Cross-Validation and Hyper Tune parameters to improve the accuracy. This amount of work may be too much for a developer who does not have much knowledge of statistics etc.

The following diagram explains the different processes that you need to follow during the Machine Learning process.

Standard Machine Learning Process

As you can see in the above diagram, standard Machine Learning practice is more complex as you have a lot of steps to be completed before the modelling. In the case of AutoML, it will perform all the tasks for you and provide the best model. You only need to deploy the provided best model and use it.

AutoML in Azure Machine Learning

There are many frameworks for AutoML and let us look at how Azure Machine Learning supports AutoML.

Selecting the option of AutoML in Azure Machine Learning

Once the New Automated ML run option is selected, you will be taken to a new screen and first, you need to configure a dataset for the Machine Learning Process.

We have selected the popular Vote dataset which describes how different senate members of the USA have voted for different acts.

Uploading a dataset.

In the above example, we have uploaded the vote CSV file. After the dataset is created, then you can select the dataset as shown in the below figure.

Selection of the dataset.

Next is to configure AutoML in Azure Machine Learning.

Configuring AzureML in Azure Machine Learning.

You need to provide the experiment name. In this classification problem, we need to configure the classification target column. In this example, Class is the target attribute. Please note that the class attribute indicates what is the party that each senator belongs to.

In AutoML in Azure Machine Learning, you have the option of running the Machine Learning process in separated hardware which is a unique feature. Since Machine Learning needs a large volume of data, you need scalable hardware to build Machine Learning models. You can define the required hardware by using Create a new compute option as shown in the below screenshot.

Defining the required hardware for ML models.

You can choose dedicated hardware with the necessary configurations. You will know the pricing before using the hardware. Further, you can increase or decrease the capacities of the hardware depending on your need.

Further, you configure the settings for the compute nodes as shown in the below screenshot. You can define the minimum and the maximum nodes that should be used during the AutoML process.

Configuration of minimum and maximum nodes for the ML execution.

After the compute nodes are defined, you need to select the task type.

Selection of the Task type.

As you can see in the above screenshot, AutoML in Azure Machine Learning supports three types of tasks that is Classification, Regression and Time Series forecasting. We will discuss the Classification task in this article and will leave the Regression and Time Series forecasting discussion for a future article.

Configuration of AutoML for Classification task.

First, we need to configure the accuracy matrix such as accuracy, AUC (Area Under the Curve) weightage, recall, precision, F1 Measure and MCC. This means that the highest value with the selected matrix will be selected as the best model. In this example, AUC weightage is selected as the accuracy matrix.

Sometimes, you do not need some of the algorithms to consider for Machine Learning modelling. You can include those algorithms in the Blocked Algorithms so that those algorithms are excluded during the AutoML execution.

Since there are multiple iterations, we need to set what is the exit criteria, if not AUtoML will execute indefinitely. In this example, 24 hours was set as the exit criteria. We have used a validation split of 70/30 for train and testing and maximum concurrent iterations are set to 2.

With these configurations, now we are ready to execute AutoML in Azure Machine Learning.

Results

Now let us see the results of AutoML in Azure Machine Learning. First, it provides the details of the AutoML execution.

Details of the AutoML execution

It provides the created and started time for the AutoML execution. Again, it provides the better algorithm out of the existing algorithms which are MaxAbsScaler, GradientBoosting. Further, this will indicate the parameter value of the selected accuracy parameter that was selected.

Then from the Models tab, you can see all the models related to the selected AutoML as shown in the following figure.

Models in AutoML

It will show what are the accuracy parameter values for each algorithm in descending order. This means the best algorithms are shown at the top.

One of the important tasks in AutoML in Azure Machine Learning is to perform the data preprocessing task. Those preprocessing tasks are performed automatically in AutoML.

Those tasks are shown under the Data Guardrails as shown in the following figure.

Data Guardrails in AutoML in Azure Machine Learning.

There are three preprocessing tasks, that are related to classification as shown in the above figure. Those options are:

  • Class balancing detection -> Since we are splitting the data into train and test datasets, we need to split the data with the same percentage of the target attribute into the train and test dataset
  • Missing feature values imputation -> In a large dataset, there can be missing values. For better results, we need to fix these missing values. In the case of numerical values, it will be the average while for the nominal values, it will be the most frequently used values
  • High cardinality feature detection ->High cardinality values are detected if available

As you can see, all the rules were passed. The following are the details for the best algorithm for the selected data set in AutoML in Azure Machine Learning.

Selected best model in AzureML.

Though we have selected AUC weighted as the selected parameter, we need to look at how the other parameters look like as shown in the following figure.

Other Accuracy parameters for the selected model.

You can deploy the selected model to either Azure Kubernetes services or Azure Container Services. Further, you can download the model script as well.

Conclusion

AutoML is a new concept that has emerged to facilitate the Machine Learning process. With AutoML in Azure Machine Learning, you need to provide the dataset with minimum configurations. In the selected dataset, few automatic preprocessing tasks are executed. In the AutoML, the best algorithm is selected for the given accuracy parameter and this can be deployed to production.

In the AutoML in Azure Machine Learning, it provides options for Classifications, Regression and Time Series Forecasting. This article has discussed how the classifications are done in AutoML.

References

Table of contents

Introduction to Azure Machine Learning using Azure ML Studio
Data Cleansing in Azure Machine Learning
Prediction in Azure Machine Learning
Feature Selection in Azure Machine Learning
Data Reduction Technique: Principal Component Analysis in Azure Machine Learning
Prediction with Regression in Azure Machine Learning
Prediction with Classification in Azure Machine Learning
Comparing models in Azure Machine Learning
Cross Validation in Azure Machine Learning
Clustering in Azure Machine Learning
Tune Model Hyperparameters for Azure Machine Learning models
Time Series Anomaly Detection in Azure Machine Learning
Designing Recommender Systems in Azure Machine Learning
Language Detection in Azure Machine Learning with basic Text Analytics Techniques
Azure Machine Learning: Named Entity Recognition in Text Analytics
Filter based Feature Selection in Text Analytics
Latent Dirichlet Allocation in Text Analytics
Recommender Systems for Customer Reviews
AutoML in Azure Machine Learning
AutoML in Azure Machine Learning for Regression and Time Series
Building Ensemble Classifiers in Azure Machine Learning
Text Classification in Azure Machine Learning using Word Vectors
Dinesh Asanka
168 Views