Clustering-based Anomaly Detection for microservices

Roman Nikiforov
DOI: https://doi.org/10.48550/arXiv.1810.02762
2018-10-05
Abstract:Anomaly detection is an important step in the management and monitoring of data centers and cloud computing platforms. The ability to detect anomalous virtual machines before real failures occur results in reduced downtime while operations engineers urgently recover malfunctioning virtual machines, efficient root cause analysis, and improved customer optics in the event said malfunction lead to an outage. Virtual machines could fail at any time, whether in a lab or production system. If there is no anomaly detection system, and a virtual machine in a lab environment fails, the QA and DEV team will have to switch to another environment while the OPS team fixes the failure. The potential impact of failing to detect anomalous virtual machines can result in financial ramifications, both when developing new features and servicing existing ones. This paper presents a model that can efficiently detect anomalous virtual machines both in production and testing environments.
Distributed, Parallel, and Cluster Computing,Machine Learning
What problem does this paper attempt to address?
This paper aims to solve the problem of virtual machine (VM) anomaly detection in data centers and cloud computing platforms. Specifically, the paper focuses on how to efficiently detect abnormal virtual machines in production and test environments. In a microservice architecture, each service usually runs as an independent virtual machine and has its own performance metrics, such as CPU load, memory usage, and disk usage. Since the failure of a single component may cause the malfunction of the entire system, it is particularly important to be able to identify these potentially faulty components in advance. Especially in development and test environments, although these environments simulate the production model, in order to save resources, they usually do not have a complete high - availability (HA) configuration. If abnormal virtual machines cannot be detected in a timely manner, it will not only affect the work efficiency of the development and test teams, but may also lead to financial losses. The paper proposes a method based on the Density - Based Spatial Clustering of Applications with Noise (DBSCAN) to detect abnormal virtual machines. This method first groups virtual machines according to the components they run, and then applies the DBSCAN algorithm within each group to identify any outliers. After identifying the outliers, a statistical - based review will also be carried out to verify the results. This method can effectively handle large amounts of data and discover virtual machines that deviate from normal behavior, thereby reducing system downtime, improving root - cause analysis efficiency, and enhancing the customer experience when a failure occurs.