An Efficient Grouped Virtual Mapreduce Cluster

Yang Yang,Xiang Long,Bo Jiang
DOI: https://doi.org/10.1109/AINA.2013.15
2013-01-01
Abstract:Virtualization technology and MapReduce program model are sharp swords for the big data and cloud computing era. The combination of them exhibits powerful ability of easy-management, fast-deployment, feasible-scalability and high-efficiency. However, the downside is that the performance is limited by the I/O bottleneck of Virtual Machine(VM). A huge number of data should be handled in MapReduce cluster which is deployed in VMs. Luckily, data locality, a very crucial issue affecting performance in a shared clusters environment, is used to ease this conflict and improve the execution time of applications. We present a framework of Grouped Virtual MapReduce Cluster(GVMC) which takes fully advantage of VM data locality to exhibit high performance of Virtual MapReduce Cluster(VMC). The introduction of local-master nodes in GVMC not only offloads the pressure of the master node, but also lowers the communication cost. We compare the organization of three different VMC, describe the architecture of our cluster framework and do the performance analysis. Our experiments demonstrate that the framework of GVMC achieves higher locality and reduces the execution time in both CPU-intensive applications and I/O-intensive applications. Compared to Original Virtual MapReduce Cluster(OVMC), the performance of GVMC improvement is up to 16.5% and 36.2% for CPU-intensive applications and I/O-intensive applications respectively.
What problem does this paper attempt to address?