RARE: a labeled dataset for cloud-native memory anomalies

Francesco Lomio,Diego Martínez Baselga,Sergio Moreschini,Heikki Huttunen,Davide Taibi
DOI: https://doi.org/10.1145/3416505.3423560
2020-11-13
Abstract:Anomaly detection has been attracting interest from both the industry and the research community for many years, as the number of published papers and services adopted grew exponentially over the last decade. One of the reasons behind this is the wide adoption of cloud systems from the majority of players in multiple industries, such as online shopping, advertisement or remote computing. In this work we propose a Dataset foR cloud-nAtive memoRy anomaliEs: RARE. It includes labelled anomaly time-series data, comprising of over 900 unique metrics. This dataset has been generated using a microservice for injecting artificial byte stream in order to overload the nodes, provoking memory anomalies, which in some cases resulted in a crash. The system was built using a Kafka server deployed on a Kubernetes system. Moreover, in order to get access and download the metrics related to the server, we utilised Prometheus. In this paper we present a dataset that can be used coupled with machine learning algorithms for detecting anomalies in a cloud based system. The dataset will be available in the form of CSV file through an online repository. Moreover, we also included an example of application using a Random Forest algorithm for classifying the data as anomalous or not. The goal of the RARE dataset is to help in the development of more accurate and reliable machine learning methods for anomaly detection in cloud based systems.
What problem does this paper attempt to address?