Surviving Failures with Performance-Centric Bandwidth Allocation in Private Datacenters

Li Chen,Baochun Li,Bo Li
DOI: https://doi.org/10.1109/ic2e.2016.34
2016-01-01
Abstract:In the context of private datacenters that are operated by Web service providers such as Google, multiple applications using data parallel frameworks, such as MapReduce, coexist and share a limited supply of link bandwidth capacities. It has been shown that failures are the norm, rather than the exception, in datacenters, and will negatively affect the performance of data parallel applications as failed tasks need to be relaunched and placed on newly selected servers. In this paper, we argue that even with the presence of failures, link bandwidth should be allocated to competing applications with performance-centric fairness, in that the performance that applications enjoy should be proportional to their weights. We formulate and solve the open challenge of jointly optimizing placement decisions for relaunched tasks and bandwidth allocation, so that the adverse effects of failures on application performance are minimized. With our proposed algorithm implemented in the Mininet emulation testbed, our experiments show the effectiveness of our solutions towards minimizing the negative effects of failures, while still achieving performance-centric fairness.
What problem does this paper attempt to address?