An Optimized Approach for Storing Small Files on HDFS-based on Dynamic Queue

Weipeng Jing,Danyu Tong
DOI: https://doi.org/10.1109/iiki.2016.55
2016-01-01
Abstract:Under the background of the rapid development of social network, massive small files data are urgently needed to be dealt effectively. Unfortunately, HDFS (Hadoop Distributed File System) does not perform well for massive small files since the heavy burden on NameNode, and poor reading performance. Therefore, in order to solve this problem, a method DQSF (Dynamic Queue of Small File) is proposed in this paper. It designs an appropriate queue for files of different sizes, which are as the basis for merge of small files. The method based on Analytic Hierarchy Process. It obtains the size of queue under the best performance when computing system's comprehensive index of the file reading, memory usage and merging efficiency. Which means the dynamic queue value under the corresponding range. Also using the text categorization algorithm based on period feature at the prior to the merger of small files. It improves the speed of file reading through the hierarchical index and index prefetching mechanism at the same time. Experimental results show that this strategy can reduce the usage of memory, improve the efficiency of accessing massive small files, resulting the great improvement of system's performance.
What problem does this paper attempt to address?