Zput: A speedy data uploading approach for the Hadoop Distributed File System

Youwei Wang,Weiping Wang,Can Ma,Dan Meng
DOI: https://doi.org/10.1109/CLUSTER.2013.6702648
2013-01-01
Abstract:Hadoop Distributed File System (HDFS) is the storage component of the Hadoop framework, which is designed for maintaining and processing huge datasets efficiently among cluster nodes. To cooperate with MapReduce, the computation infrastructure of Hadoop, data is required to be uploaded from local file systems to HDFS. Unfortunately when data is of massive scale, the uploading procedure becomes extremely time-consuming, which causes serious delay for urgent tasks. This primary contribution of this paper is the proposition of Zput, a speedy data uploading mechanism which can significantly accelerate uploading by using metadata mapping approach. After the implementation is described and corresponding advantages are narrated, disadvantages are also analyzed and eliminated by using an approach named remote block placement. Evaluation results show this new mechanism can reduce the running time of uploading process by about 60-90%, and the remote block placement can boost the course of block distribution by about 30-40%, while maintaining the complete compatibility for upper-layer applications.
What problem does this paper attempt to address?