An Experimental Study on Data Recovery Performance Improvement for HDFS with NVM

Huijie Li,Xin Li,Youyou Lu,Xiaolin Qin
DOI: https://doi.org/10.1109/icccn49398.2020.9209698
2020-01-01
Abstract:The Non-Volatile Memory (NVM) is the promising device to store data and accelerate big data analysis due to its excellent I/O performance. However, we find that simply replacing Hard Disk Drive (HDD) with NVM cannot bring the expected performance improvement. In this paper, we take the data recovery issue in Hadoop File System (HDFS) as a case study to investigate how to take advantage of the performance of NVM. We analyze the data recovery mechanism in HDFS and find that the configuration of replication tasks in the DataNode can affect the data recovery significantly. We conduct extensive analysis and experiments to tuning the configuration and also get some interesting findings. With the new configuration, we increase the data recovery performance improvement from 17% to 71%. At the same time, we can also improve the execution performance of MapReduce tasks to 28% to 59% through optimized configuration.
What problem does this paper attempt to address?