Anomaly Detection: Review and preliminary Entropy method tests

Pelumi Oluwasanya
DOI: https://doi.org/10.48550/arXiv.1708.08813
2017-08-29
Abstract:Anomalies are strange data points; they usually represent an unusual occurrence. Anomaly detection is presented from the perspective of Wireless sensor networks. Different approaches have been taken in the past, as we will see, not only to identify outliers, but also to establish the statistical properties of the different methods. The usual goal is to show that the approach is asymptotically efficient and that the metric used is unbiased or maybe biased. This project is based on a work done by [1]. The approach is based on the principle that the entropy of the data is increased when an anomalous data point is measured. The entropy of the data set is thus to be estimated. In this report however, preliminary efforts at confirming the results of [1] is presented. To estimate the entropy of the dataset, since no parametric form is assumed, the probability density function of the data set is first estimated using data split method. This estimated pdf value is then plugged-in to the entropy estimation formula to estimate the entropy of the dataset. The data (test signal) used in this report is Gaussian distributed with zero mean and variance 4. Results of pdf estimation using the k-nearest neighbour method using the entire dataset, and a data-split method are presented and compared based on how well they approximate the probability density function of a Gaussian with similar mean and variance. The number of nearest neighbours chosen for the purpose of this report is 8. This is arbitrary, but is reasonable since the number of anomalies introduced is expected to be less than this upon data-split. The data-split method is preferred and rightly so.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the anomaly detection problem in wireless sensor networks. Specifically, the goal of the paper is to detect abnormal data points different from normal data in wireless sensor networks and evaluate the performance of the entropy - estimation - based method in this task. ### Problem Background Wireless Sensor Networks (WSNs) are widely used in fields such as environmental monitoring, traffic control, and agricultural monitoring. These networks collect and transmit data through a large number of distributed sensor nodes. However, in practical applications, due to various reasons (such as sensor failures, malicious attacks, environmental interference, etc.), sensors may generate abnormal data. If these abnormal data are not identified and processed, they may lead to wrong decision - making or diagnosis results. ### Paper Goals The main goal of the paper is to develop an effective anomaly detection method that can accurately identify abnormal data points in wireless sensor networks. Specifically, the author adopts an entropy - estimation - based method for anomaly detection. The core idea of this method is that when abnormal data points appear in a data set, the entropy of the data set will increase. Therefore, by estimating the entropy of the data set, abnormal data points can be indirectly detected. ### Main Steps 1. **Literature Review**: A comprehensive review of existing anomaly detection methods was carried out, and the advantages and limitations of various methods were analyzed. 2. **Preliminary Experiments**: Experiments were carried out using synthetic data and actual measurement data to verify the effectiveness of the entropy - estimation - based anomaly detection method. 3. **Algorithm Implementation**: - Use the k - Nearest Neighbor (k - NN) method to estimate the probability density function (PDF) of the data. - Calculate the entropy of the data set based on the estimated PDF. 4. **Performance Evaluation**: Evaluate the accuracy of the detection results through statistical tools such as Receiver Operating Characteristic (ROC) curves and Quantile - Quantile (Q - Q) plots. 5. **Future Work**: Proposed possible directions for improving algorithm performance and planned subsequent research plans. ### Key Formulas In order to more clearly understand the entropy - based anomaly detection method, the following are some key formulas involved: - **Probability Density Function (PDF) Estimation**: \[ \hat{p}(x)=\frac{k}{n V_k(x)} \] where \( k \) is the number of nearest neighbors, \( n \) is the total number of samples, and \( V_k(x) \) is the volume containing \( k \) nearest neighbors centered at \( x \). - **Entropy Estimation**: \[ H(X)=-\sum_{i = 1}^{n}\hat{p}(x_i)\log\hat{p}(x_i) \] where \( H(X) \) represents the entropy of the data set \( X \), and \(\hat{p}(x_i)\) is the probability density estimate of the \( i \)-th sample. ### Summary This paper aims to solve the challenges of anomaly detection in wireless sensor networks through the entropy - estimation - based method. Through the review of existing methods and preliminary experiments, the effectiveness of this method has been verified, and directions for future improvements have been provided.