An Improved Data Placement Strategy for Hadoop

Lin Wei-wei
DOI: https://doi.org/10.3969/j.issn.1000-565X.2012.01.026
2012-01-01
Abstract:In the existing default data placement strategy for Hadoop,much time is needed to restore data from a remote DataNode when the local replicas become unavailable,and the load balancing may be destroyed due to the random selection of DataNode for data storage.In order to solve these problems,an improved data placement strategy is proposed,which chooses the most appropriate DataNode to place remote replicas according to the scheduling evaluation value of each DataNode based on DataNodes' network distance and data load.Thus,the load balancing for data storage is implemented and excellent data transmission is achieved.The proposed data placement strategy is then implemented in the Hadoop platform and the results show that the proposed strategy is superior to the existing default data placement strategy because it improves the local balancing for data storage and reduces the time for data placement.
What problem does this paper attempt to address?