SGDP: A Stream-Graph Neural Network Based Data Prefetcher

Yiyuan Yang,Rongshang Li,Qiquan Shi,Xijun Li,Gang Hu,Xing Li,Mingxuan Yuan
2023-10-11
Abstract:Data prefetching is important for storage system optimization and access performance improvement. Traditional prefetchers work well for mining access patterns of sequential logical block address (LBA) but cannot handle complex non-sequential patterns that commonly exist in real-world applications. The state-of-the-art (SOTA) learning-based prefetchers cover more LBA accesses. However, they do not adequately consider the spatial interdependencies between LBA deltas, which leads to limited performance and robustness. This paper proposes a novel Stream-Graph neural network-based Data Prefetcher (SGDP). Specifically, SGDP models LBA delta streams using a weighted directed graph structure to represent interactive relations among LBA deltas and further extracts hybrid features by graph neural networks for data prefetching. We conduct extensive experiments on eight real-world datasets. Empirical results verify that SGDP outperforms the SOTA methods in terms of the hit ratio by 6.21%, the effective prefetching ratio by 7.00%, and speeds up inference time by 3.13X on average. Besides, we generalize SGDP to different variants by different stream constructions, further expanding its application scenarios and demonstrating its robustness. SGDP offers a novel data prefetching solution and has been verified in commercial hybrid storage systems in the experimental phase. Our codes and appendix are available at <a class="link-external link-https" href="https://github.com/yyysjz1997/SGDP/" rel="external noopener nofollow">this https URL</a>.
Operating Systems,Distributed, Parallel, and Cluster Computing,Machine Learning
What problem does this paper attempt to address?
Based on the provided paper abstract and partial content, the problem that this paper attempts to solve is: in storage system optimization and access performance improvement, traditional data prefetchers can well mine the access patterns of sequential logical block addresses (LBA), but are unable to handle the complex non - sequential patterns common in real - world applications. Although existing learning - based prefetchers cover more LBA accesses, they fail to fully consider the spatial dependencies between LBA increments, resulting in limited performance and robustness. Specifically, the paper proposes a new data pre - fetcher based on the stream graph neural network (SGDP), aiming to represent the interaction relationships in the LBA increment flow through a weighted directed graph structure, and further extract mixed features through the graph neural network to achieve data prefetching. This method is particularly suitable for handling data prefetching problems in applications with complex patterns, and can improve the hit rate, effective prefetch rate, and accelerate the inference time. Experimental results show that SGDP outperforms the existing state - of - the - art methods on multiple real - world datasets.