A Cache Sharing Mechanism Based on RD MA.
Xiao Zhang,Jinbo Yv,Yun Liu,Song Xiao,Nannan Zhao,Xiaonan Zhao
DOI: https://doi.org/10.1109/hpcc-dss-smartcity-dependsys57074.2022.00073
2022-01-01
Abstract:Caching improves storage system access efficiency and is widely used in operating systems, databases, and various Internet applications. However, the efficiency of the cache is related to the cache hit ratio and cache policy. But in the HDFS, the cached data is only accessible when the job resident in one same node. The internal cache of each node is isolated, and tasks within the node only use the cached data. As a result, disk IO, job scheduling, and data distribution significantly impact data processing efficiency. This paper presents a cache-sharing mechanism based on Remote Direct Memory Access (RDMA) for HDFS. The cache of data nodes is connected through RDMA to make them accessible from any nodes. Compared with the existing HDFS cache mechanism, the scheme in this paper improves the cache utilization, reduces the access delay, and significantly improves the read and write performance of HDFS. Experimental results show that compared with the existing HDFS cache mechanism, the read file throughput rate can be improved by 2.2 times, and the write throughput rate can be improved by 0.677 times. Furthermore, when the task runs on different nodes, reading delay STDEV standard deviation decreases from 12.22s to 0.39s. Significant decrease in the I/O fluctuations.