FlowPinpoint: Localizing Anomalies in Cloud-client Services for Cloud Providers

Ruopeng Geng,Chongrong Fang,Shiyang Guo,Daxiang Kang,Biao Lyu,Shunmin Zhu,Peng Cheng
DOI: https://doi.org/10.1109/tcc.2023.3257162
IF: 5.697
2023-01-01
IEEE Transactions on Cloud Computing
Abstract:For public cloud providers, it is of great significance to maintain the availability of their cloud services, which requires efficient anomaly diagnosis and recovery. To achieve such properties, the first step is to localize the anomalies, i.e., determining where they happen in the network path of cloud-client services. We propose FlowPinpoint to perform anomaly localization for cloud providers. FlowPinpoint collects statistics of each network flow at the cloud network gateways (i.e., gateway flowlog), where the collected data can reflect the information from both the cloud side and the Internet side. Aggregation and association are conducted on the datacenter-scale gateway flowlogs by Alibaba's big data computing platform. In order to preclude the disturbance of anomaly-unrelated flowlogs, a two-layer filter is proposed which consists of an indicator-based filter and an isolation forest filter. Finally, the anomaly localization analyzer classifies the flowlogs and determines whether the anomaly is inside the cloud network or not according to the classification results. FlowPinpoint is implemented and tested in the production environment of Alibaba Cloud, and it correctly localizes 1 anomaly inside the cloud and 6 anomalies on the Internet over 4 months.
computer science, information systems, theory & methods
What problem does this paper attempt to address?