Dynamic Data Storage and Management Strategies for Distributed File System

Feng Liu,Di Lin,Yao Qin,Yuan Gao,Jiang Cao
DOI: https://doi.org/10.1007/978-3-030-90196-7_50
2021-01-01
Abstract:HDFS has a very wide range of applications in the field of big data, but HDFS was designed for a homogeneous environment at the beginning. HDFS adopts a static replica management strategy, the storage location and number of file replicas will not change after determination. This strategy will low overall system performance. In this paper, we propose optimized replica management strategy, abbreviated as ORMP to fix this problem. ORMP is based on file heat value and LSTM. File heat value is proposed to evaluate the activity of files. LSTM is used to predict the access times of files. Based on LSTM, the file heat value can be updated regularly, so we can dynamically change the storage location and number of replicas. Experiments show that ORMP is 22.08% faster in reading speed compared with the default replicas management strategy.
What problem does this paper attempt to address?