Optimization of small files accessed base on MapFile in HDFS

Longzhen DUAN,Xinli HONG,Taorong QIU
DOI: https://doi.org/10.13764/j.cnki.ncdg.2017.02.014
2017-01-01
Abstract:Hadoop distributed file system (HDFS) has a very good performance in accessing large files,but it was inefficient when accessing massive small files.For that reason,a new strategy for optimizing the access of small files was proposed in this paper.When storing small files,they will be merged into MapFile by type and access rights in the Client Node,then HDFS will handle those large files.When reading small files,a cache module was introduced,which composed of a buffer area of Nexist file,Cache L1 and Cache L2.Experiments showed that,this strategy can reduce the memory consumption of NameNode when accessing massive small files effectively,reduce the time for accessing small files,and greatly improve the performance of accessing simultaneously.
What problem does this paper attempt to address?