Lightweight and Adaptive Service API Performance Monitoring in Highly Dynamic Cloud Environment

Jingmin Xu,Yuan Wang,Pengfei Chen,Ping Wang
DOI: https://doi.org/10.1109/SCC.2017.80
2017-01-01
Abstract:Cloud platforms and services usually provide an APIlayer as decoupled, language agnostic interface for both front-end client integration and back-end data and/or function access. The availability and performance of the APIs have significant impact on the quality of end user or client experiences due to its nature of interaction endpoints. However, the extreme dynamics, complexity and scale of the current cloud platforms challenge the applicability of the existing performance monitoring and anomaly detection approaches from timeliness, accuracy, and scalability perspectives. This paper presents a novel approach to API performance monitoring,which recognizes performance problems by response time deviation from a baseline response time / throughput model that are created and continuously updated through online learning. In the postdetection phase, an MIC (Maximal Information Criteria) based correlation algorithm is used to group alerts into a higher leveland more informative hyper-alerts for end user notification. We prototyped our solution for a large-scale commercial cloud platform,evaluated it using three months' API performance metrics data,and compared with a couple of existing representative algorithms and tools. The results show our approach is able to detect API performance anomalies with a high F1-score. Compared to existing Granger based approach, our approach has achieved nearly onetime increase in F1-score. Moreover, the alert reduction ratio of our approach outperforms several state-of-the-art approaches.
What problem does this paper attempt to address?