A Virtual Shared Metadata Storage for HDFS
Jiang Zhou,Yong Chen,Xiaoyan Gu,Weiping Wang,Dan Meng
DOI: https://doi.org/10.1109/nas.2015.7255195
2015-01-01
Abstract:Hadoop is a popular open-source framework that allows distributed analysis of large datasets using the MapReduce programming model. A distributed file system HDFS is implemented to provide high-throughput access to datasets. HDFS can achieve high performance metadata service but has two disadvantages. First, when the metadata server stores metadata on persistent devices, it is restricted to read and write operations of local disks. Second, it also lacks effective methods for metadata synchronization and replication, which is critical for metadata availability and reliability. In this research, we introduce a novel Virtual Shared Storage Pool ( VSSP) concept and design for storing and sharing metadata in HDFS. The VSSP is a virtual storage device which is built on existing servers and transparent to upper layers. Two strategies, a journal synchronization based on the 2PC protocol and a fine-grained image replication, are introduced in the VSSP according to different metadata access features. The VSSP not only reduces the overhead on metadata modification operations, but also improves the I/O performance for namespace storage. Experimental results show that the VSSP improved the average performance by 40.51% and 23.46% when writing logs compared with the BookKeeper and Hadoop QJM. The average image read and write throughput was nearly 5 times and 2.4 times better than NFS and the original approach. These results confirm that the proposed VSSP solution significantly improves the metadata access performance, scalability, and reliability for HDFS.
What problem does this paper attempt to address?