Scalable deterministic overlay network diagnosis
Y Zhao,Y Chen,D Bindel
2006-01-01
Abstract:Internet fault diagnosis is important to end users, overlay network service providers (like Akamai), and Internet service providers (ISPs). For example, with Internet fault diagnosis tools, users can choose more reliable ISPs. However, The modern Internet is heterogeneous and largely unregulated, which renders the Internet diagnosis an increasingly challenging problem. Though several router-based Internet diagnosis tools have been proposed [1], [2], these tools generally depend on ICMP measurements. ICMP-based tools are subject to ICMP rate limiting, are sensitive to cross-traffic, and are un-scalable. In contrast, many recently-developed tools for Internet Tomography use signal processing and statistical approaches to infer link level properties [3], [4], [5], [6] or shared congestion [7] based on end-to-end measurements of IP routing paths. We define paths to be IP routing paths between pairs of end hosts; paths are made up of links, which are IP connections between routers. The latency along a path is the sum of the latencies along the links that make up the path; and other path properties can similarly be expressed in terms of link properties. The relation between path and link properties can be written as a large linear system; however, as we observed in [8], the linear system is fundamentally underconstrained: there exist unidentifiable links [9], [8] with properties that cannot be uniquely determined from path measurements. In order to estimate the properties of unidentifiable links, Internet tomography tools often impose statistical assumptions; thus, the accuracy of the predicted link properties is subject to uncertainty in the model assumptions. As shown below, such statistics-based tools are neither deterministic nor scalable. Existing tomography systems analyze the temporal correlations among multiple receivers in a multicast-like environment; and with enough probes, they can infer the loss rate of each path segment with high probability. However, their inference results are not deterministic or unique for two reasons. First, they can only achieve 100% determinism with infinitely many probes. Second, while these systems can obtain very high probability estimates with a certain number of probes (the exact number depends on the depth of the tree and the number of receivers), they suppose an ideal multicast environment. However, given that multicast does not really exist in the Internet, they have to use unicast for approximation. Thus the inference accuracy heavily depends on the cross traffic of the network, and there is no guarantee or bound on the inference accuracy. Furthermore, the iterative refinement algorithms used to compute the link properties are expensive for large networks, and may not always converge. Thus it remains an open problem to find which links or sequences of links can be uniquely characterized from end-to-end measurements, for which we will tackle in this paper. Problem Definition and Solution Here we define the granularity as the length of a sequence of links on a path. We would like a fine-grained characterization of the overlay network behavior, i.e., to characterize the properties of very short sequences of links. Fine-grained characterization is important for congestion and failure diagnosis, since the granularity determines how well we can localize problems. Because the linear system relating link properties to path properties is underconstrained even for a very large overlay network [8], we cannot resolve the properties of each link individually. What, then, is the finest granularity we can attain?