Improving reading performance by file prefetching mechanism in distributed cache systems

Jing Gui,Yongbin Wang,Wuyue Shuai
DOI: https://doi.org/10.1002/cpe.8215
2024-07-19
Concurrency and Computation Practice and Experience
Abstract:Summary Distributed cache systems are utilized to enhance I/O performance between computing applications and storage systems. However, the traditional file access predictors employed in these cache systems are only suitable for workloads with simple file access patterns, rendering them inadequate for the complex access patterns found in big data computing scenarios. In this article, we propose a file access predictor (DFAP) based on WaveNet, which has exhibited promising results in file access tasks when compared to other baseline models. Cache systems are often constrained by limited cache space due to cost, cluster size, and other factors. In big data scenarios, cached data and prefetched data often compete for limited space. To address this issue, we introduce a cache prefetching algorithm (CBAP) for cache systems, which is based on cost‐benefit analysis to improve cache utilization. Furthermore, we implement a novel file prefetching framework on Alluxio, which accelerates computing jobs by up to 18%.
computer science, theory & methods, software engineering
What problem does this paper attempt to address?