G-RCA: a Generic Root Cause Analysis Platform for Service Quality Management in Large IP Networks

He Yan,Lee Breslau,Zihui Ge,Dan Massey,Dan Pei,Jennifer Yates
DOI: https://doi.org/10.1109/tnet.2012.2188837
2012-01-01
IEEE/ACM Transactions on Networking
Abstract:An increasingly diverse set of applications, such as Internet games, streaming videos, e-commerce, online banking, and even mission-critical emergency call services, all relies on IP networks. In such an environment, best-effort service is no longer acceptable. This requires a transformation in network management from detecting and replacing individual faulty network elements to managing the end-to-end service quality as a whole. In this paper, we describe the design and development of a Generic Root Cause Analysis platform (G-RCA) for service quality management (SQM) in large IP networks. G-RCA contains a comprehensive service dependency model that incorporates topological and cross-layer relationships, protocol interactions, and control plane dependencies. G-RCA abstracts the root cause analysis process into signature identification for symptom and diagnostic events, temporal and spatial event correlation, and reasoning and inference logic. G-RCA provides a flexible rule specification language that allows operators to quickly customize G-RCA and provide different root cause analysis tools as new problems need to be investigated. G-RCA is also integrated with data trending, manual data exploration, and statistical correlation mining capabilities. G-RCA has proven to be a highly effective SQM platform in several different applications, and we present results regarding BGP flaps, PIM flaps in Multicast VPN service, and end-to-end throughput degradation in content delivery network (CDN) service.
What problem does this paper attempt to address?