Dinesh Asanka
Filtering the anomalies in the time series.

Time Series Anomaly Detection in Azure Machine Learning

April 1, 2021 by

In this article, we will be discussing how to use Time Series Anomaly Detection in Azure Machine Learning and this article comes next in the Azure Machine Learning series. During this article series on Azure Machine Learning, we have discussed multiple machine learning techniques such as Regression analysis, Classification Analysis and Clustering. Further, we have discussed the basic cleaning techniques, feature selection techniques and Principal component analysis, Comparing Models and Cross-Validation and Hyper Tune parameters until today in this article series.

What is a Time Series?

Time series means that you have data set in which you have date-time attributes and continuous attributes such as amount, rainfall, etc. With the expansion of IoT devices, you will see a lot of time series data in action today. There are a large number of components in a Time Series as discussed in this blog posts and due to this complexity, time series analysis is a much more complex analysis. Due to the large volume of data and higher velocity of data, there are more chances that there are a lot of errors in the time-series data. Due to the large data errors, it is important to perform Time Series Anomaly Detection before performing any insight into the data.

In the world of Azure, there are three different tools for Time Series. You have the Azure Time Series Insight to analysis time series with different groups. In the Azure Machine Learning Services, you have the option of performing time series forecasting. In the Azure Machine learning portal, you have the control called Time Series Anomaly Detection to carry out anomaly detections in Time Series.

Data Set

As we have been working with the Adventureworks data set for most of the examples in the article series, this time we need a data set with a data time attribute. This time, let us look at the COVID-19 data set from https://data.world/shad/covid-19-time-series-data. You can download a data set and upload it to the Azure Machine learning portal as we did in the very first article. We will be using the COVID-19 confirmed cases dataset to demonstrate the features of the Time Series Anomaly Detection control in Azure Machine Learning.

In this data set, there are three attributes, country, total, and date. By introducing a Summarize Data control you can look at the properties of the selected dataset. It shows that there are 70,272 records for 192 countries over a year.

Time Series Anomaly Detection

Now let’s see, how we can incorporate the new control. To find out the anomalies, this control needs a unique data value. In this dataset, the date column is unique for each country. Therefore, either you need to filter a time series for a month or you need to aggregate the data for the date by using Apply SQL Transformation control.

Apply SQL Transformation control to aggregate COVID-19 dataset.

In this control, data can be aggregated by placing the above query. Now, data is aggregated for each date. Next, we need to include the Time Series Anomaly Detection control in order to find the anomalies in the time series.

To find out the time series anomalies, there are a few configurations to be done for the selected control as shown in the below figure.

Configuring the Time Series Anomaly Detection control in Azure Machine Learning.

Out of those configurations, you need to select the time and date column of the time series. In this scenario, those two columns are date, total respectively. In some cases, you may have to change the data type of the date attribute by using the Edit Metadata control.

The next five parameters are to identify the anomalies in the selected time series. There can be mainly two types of anomalies that is the trend and the value. Martingale type is used to identify the value anomalies while Strangeness Function Type is used to identify the trend anomalies.

Parameter

Option

Description

Martingale Type

PowerAvg

This will work for most of the time series which is the default value.

Power

Alone with the Epsilon parameter, you can define the sensitivity.

Strangeness Function Type

RangePercentile

The default and the most common option.

SlowPosTrend

To identify the positive trend changes

SlowNegTrend

To identify the negative trend changes

For both parameters, you can provide the value that defines how many historical values it should check for. Though the default value is 500, you have the option of specifying a value between 0 – 5000.

Alert thresh hold is used to define what is the threshold value that should be identified as an anomaly. The default value is 3.25 and you can specify a value between 0-100.

After configuring the Time Series Anomaly Detection as mentioned above, now you are ready to execute the experiment and you will get the following results from the Time Series Anomaly Detection control.

Output of the Time Series Anomaly Detection.

You will see that two additional attributes are added to the data stream namely, Anomaly Score and Alert indicator. Now let us use a Split data control to identify the anomaly using the Regular Expression splitting mode.

Filtering the anomalies in the time series.

This configuration will give the output of anomalies in the input time series.

Anomalies in the time series.

As shown in the above figure, the control has identified two anomalies.

Anomaly Replacement

Though the identification of anomalies is an important task, it is also important to replace anomalies with correct values. There are several ways of replacing the anomaly values.

  1. Replace with a constant
  2. Replace with a mean/mode, .etc
  3. Replace with previous values
  4. Replace with the weighted average of previous and after values

Let us look at how we can replace the anomalies with the weighted average of previous and after values in the same experiment.

Complete experiment in Azure Machine Learning.

As you can see from the above figure, it is somewhat complex, but we will look at step by step. However, this experiment is published for the public and it is available at https://gallery.azure.ai/Experiment/Time-Series-Anomaly-Detection-3

Step 1: Find previous and next days

We will be using Execute Python Script to find the previous and next days with the following python script.

Then two Join Data controls are used to join the previous and the next date with the aggregated data sets. Select Columns in Dataset and Edit Metadata is used to select the data and rename the columns respectively.

Step 2: Applying weightage Average of Previous and Next values

Both data sets were joined with the dates so that previous and next values in a row as shown below.

Applying weightage Average of Previous and Next values

Next, we want to generate the weightage average for the pretotal and nexttotal attributes using the following script using the Apply SQL Transformation control

If you want to replace the anomaly value with the previous or next values without adding any weightage average, you can simply include the weightage as zero to the unwanted component. After the weightage average is calculated, then we will add the non-anomaly data set again to perform the Time Series Anomaly Detection. You will see that one of the anomaly records is eliminated and still one record exits.

Conclusion

In this article, we looked at another Azure Machine Learning Control named Time Series Anomaly Detection. Since time series is a very complex dataset, there can be a lot of anomalies data in the tome series. Using different parameters, we can identify anomaly data in the time series. Further, we have extended the Azure Machine Learning experiment to replace the anomalies with the weightage average of the previous and next values.

Further References

Table of contents

Introduction to Azure Machine Learning using Azure ML Studio
Data Cleansing in Azure Machine Learning
Prediction in Azure Machine Learning
Feature Selection in Azure Machine Learning
Data Reduction Technique: Principal Component Analysis in Azure Machine Learning
Prediction with Regression in Azure Machine Learning
Prediction with Classification in Azure Machine Learning
Comparing models in Azure Machine Learning
Cross Validation in Azure Machine Learning
Clustering in Azure Machine Learning
Tune Model Hyperparameters for Azure Machine Learning models
Time Series Anomaly Detection in Azure Machine Learning
Designing Recommender Systems in Azure Machine Learning
Language Detection in Azure Machine Learning with basic Text Analytics Techniques
Azure Machine Learning: Named Entity Recognition in Text Analytics
Filter based Feature Selection in Text Analytics
Latent Dirichlet Allocation in Text Analytics
Recommender Systems for Customer Reviews
AutoML in Azure Machine Learning
AutoML in Azure Machine Learning for Regression and Time Series
Building Ensemble Classifiers in Azure Machine Learning
Text Classification in Azure Machine Learning using Word Vectors
Dinesh Asanka
168 Views