Optimizing the Topologies of Virtual Networks for Cloud-Based Big Data Processing

Cong Xu,Jiahai Yang,Hui Yu,Haizhuo Lin,Hui Zhang
DOI: https://doi.org/10.1109/HPCC.2014.36
2014-01-01
Abstract:Cloud-based big data platforms are being widely adopted in industry, due to their advantages of facilitating the implementation of big data processing and enabling elastic service framework. Alongside with the widespread adoption of cloud-based MapReduce frameworks, a series of solutions have been proposed to improve the performance of big data services over cloud. Majorities of the existing studies concentrate on optimizing the task scheduling or resource provisioning mechanisms to improve the platform's data processing or communication performance separately, without an overall consideration of both the performance factors. Moreover, these studies seldom consider the impact of virtual network topologies on the performance of MapReduce workflows. The purpose of this work is to optimize the topologies of virtual networks used in cloud-based MapReduce frameworks. We formulate both data transmission and data processing overhead of a specific cloud-based big data application, describe the optimal deployment of virtual networks as an optimization problem and then design algorithms to solve this problem. Experimental results show that our topology optimization mechanism improves the overall performance of cloud-based big data applications dramatically.
What problem does this paper attempt to address?