A Novel Approach For Efficient Accessing Of Small Files In Hdfs: Tlb-Mapfile

Meng Bing,Guo Wei-Bin,Fan Gui-Sheng,Qian Neng-Wu
DOI: https://doi.org/10.1109/SNPD.2016.7515978
2016-01-01
Abstract:Hadoop distributed file system (HDFS) was originally designed for streaming access large files, but the access and storage efficiency is low for the mass small files. This paper presents an access optimization approach for HDFS small file based on MapFile: TLB-MapFile. TLB-MapFile merges massive small files into large files by MapFile mechanism to reduce NameNode memory consumption and add fast table structure (TLB) in DataNode, and to improve retrieval efficiency of small files. First, according to MapFile mechanism, small files are merged into large files and stored in HDFS. Second, the access frequency and the ordered queue of small files (per unit time) can be obtained through accessing system audit logs in HDFS, and the mapping information between block and small files are stored in the TLB table with regularly being updated. TLB-MapFile improves access efficiency of small files through the prefetching of priori strategies based on TLB table. Experiment results show that this method can effectively reduce NameNode memory consumption and improve the reading speed of small files.
What problem does this paper attempt to address?