An Approach of Fast Data Manipulation in HDFS with Supplementary Mechanisms

Youwei Wang,Can Ma,Weiping Wang,Dan Meng
DOI: https://doi.org/10.1007/s11227-014-1287-6
2014-01-01
Abstract:The Hadoop framework has been widely applied in miscellaneous clusters to build large scalable and powerful systems for massive data processing based on commodity hardware. Hadoop distributed file system (HDFS), the distributed storage component of Hadoop, is responsible for managing vast amount of data effectively in large clusters. To utilize the parallel processing infrastructure of Hadoop, Map/Reduce, the traditional workflow needs to upload data from local file systems to HDFS first. Unfortunately, when dealing with massive data, the uploading procedure becomes extremely time-consuming which causes almost intolerable delay for urgent tasks, along with unnecessary space waste due to replicated data. The primary contribution of this paper is the proposition of Zput and its supplementary mechanism named Zport. After the implementation is described, we introduce several improved details which are significant for runtime efficiency and performance. Evaluation results prove that Zput can accelerate the local data uploading procedure by over 315.4 %, while Zport can boost the remote block distribution by over 190.3 %. Besides, the compatibility for upper-layer applications remains intact.
What problem does this paper attempt to address?