Minimizing Wide-Area Performance Disruptions in Inter-Domain Routing

Yi Zhu
2011-01-01
Abstract:: The Internet is the platform for most of our communications needs today. The networks underlying the Internet undergo continual change - both planned changes (e.g., adding a new router) or unplanned failures. Unfortunately, these changes can lead to performance disruptions, which affect the user experience. Because of this, network operators have to quickly diagnose and fix any problems that arise. Diagnosing wide-area performance disruptions is challenging: first, each network has limited visibility into other networks so network operators must collect and analyze measurements of routing and traffic data in order to infer the root cause of the disruption; second, there are so many potential factors which might lead to performance disruptions, and these factors are usually interdependent of each other; third, there are no formalized ways to define metrics and classify the performance disruption according to the causes, thus network diagnosis is usually done in an ad-hoc manner. The thesis conducts two case studies to diagnose wide-area performance disruptions from the perspectives of a large tier-1 Internet Service Provider (ISP) and a large content distribution network (CDN): i) From the ISP's perspective, we designed and implemented a system that tracks inter-domain route changes at scale and in real time. Our system can be used as the building block for many diagnosis tools for the ISPs. ii) From the CDN's perspective, we focus on diagnosing wide-area network changes which resulted in latency increases to access the services in the CDN. We designed a method for automatically classifying large increases of latency, and evaluated our techniques on one month of measurement data to identify major sources of high latency for the CDN. Stepping back, the difficulties in network diagnosis can be traced back to the inter-domain routing protocol itself.
What problem does this paper attempt to address?