Localizing Root Causes of Performance Anomalies in Cloud Computing Systems by Analyzing Request Trace Logs

HaiBo Mi,HuaiMin Wang,YangFan Zhou,Michael R. Lyu,Hua Cai
DOI: https://doi.org/10.1007/s11432-012-4747-8
2012-01-01
Science China Information Sciences
Abstract:It is hard to localize the primary cause of performance anomalies in cloud computing systems because of the complexity of interactions between components.The hidden connections in the huge number of request execution paths in such systems usually contain useful information for diagnosing performance anomalies.We propose an approach to localize anomalous invoked methods and their physical locations by leveraging request trace logs,which involves two steps:(1) firstly,cluster the requests according to their corresponding call sequences,identify anomalous requests with principal component analysis,and then pick out anomalous methods with Mann-Whitney hypothesis test;(2) secondly,compare the behavior similarities of all replicated instances of the anomalous methods with Jensen-Shannon divergence,and select the ones whose behaviors are different from those of others,which will be chosen as the final culprits of performance anomalies.We conduct experiments with four real-world cases to validate our approach in Alibaba Cloud Computing Inc.The results demonstrate that our approach can locate the prime causes of performance anomalies with the low false-positive rate and false-negative rate.
What problem does this paper attempt to address?