SLDP: A Novel Data Placement Strategy for Large-Scale Heterogeneous Hadoop Cluster

Runqun Xiong,Junzhou Luo,Fang Dong
DOI: https://doi.org/10.1109/cbd.2014.57
2014-01-01
Abstract:Hadoop as a popular open-source implementation ofMapReduce is widely used for large scale data-intensive applicationslike data mining, web indexing and scientific computing.The current Hadoop implementation assumes that nodes in acluster are homogeneous in nature, and Hadoop distributed filesystem(HDFS) distributes data to multiple nodes based on diskspace availability. Such data placement strategy is very efficientfor homogeneous environments, where nodes are identical interms of both computing power and disk capacity. Unfortunately,in practice, the homogeneity assumptions do not always hold.Hadoop's scheduler will lead to severe performance degradationand energy dissipation in heterogeneous environments by usingdefault data placement strategy of HDFS. In this paper, wepropose a novel snakelike data placement mechanism (SLDP)for large-scale heterogeneous Hadoop cluster. SLDP adopts aheterogeneity aware algorithm to divide various nodes intoseveral virtual storage tiers(VST) firstly, and then places datablocks across nodes in each VST circuitously according to thehotness of data. Furthermore, SLDP uses a hotness proportionalreplication to reduce disk space consumption and also has aneffective power control function. Experimental results on two realdata-intensive applications show that SLDP is energy-efficient,space-saving and able to improve MapReduce performance in aheterogeneous Hadoop cluster significantly.
What problem does this paper attempt to address?