An Improved Data Placement Strategy in a Heterogeneous HadoopCluster

wentao zhao,lingjun meng,jiangfeng sun,yang ding,haohao zhao,lina wang
DOI: https://doi.org/10.2174/1874110x01509010792
2015-01-01
The Open Cybernetics & Systemics Journal
Abstract:Hadoop Distributed File System (HDFS) is designed to store big data reliably, and to stream these data at high bandwidth to user applications. However, the default HDFS block placement policy assumes that all nodes in the cluster are homogeneous, and randomly place blocks without considering any nodes' resource characteristics, which decreases self-adaptability of the system. In this paper, we take account nodes heterogeneities, such as utilization of nodes' disk space, and put forward an improved blocks placement strategy for solving some drawbacks in the default HDFS. The simulation experiments indicate that our improved strategy performs much better not only in the data distribution but also significantly saves more time than the default blocks placement.
What problem does this paper attempt to address?