Distance based prefetching algorithms for mining of the sporadic requests associations

Vadim Voevodkin,Andrey Sokolov
2024-06-13
Abstract:Modern storage systems intensively utilize data prefetching algorithms while processing sequences of the read requests. Performance of the prefetching algorithm (for instance increase of the cache hit ratio of the cache system - CHR) directly affects overall performance characteristics of the storage system (read latency, IOPS, etc.). There are widely known prefetching algorithms that are focused on the discovery of the sequential patterns in the stream of requests. This study examines a family of prefetching algorithms that is focused on mining of the pseudo random (sporadic) patterns between read requests - sporadic prefetching algorithms. The key contribution of this paper is that it discovers a new, lightweight family of distance-based sporadic prefetching algorithms (DBSP) that outperforms the best previously known results on MSR traces collection.Another important contribution of this paper is a thorough description of the procedure for comparing the performance of sporadic prefetchers.
Databases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the performance optimization of read requests in modern storage systems, especially how to improve the cache hit ratio (CHR) and reduce access latency by improving the pre - fetching algorithm. Specifically, the paper focuses on mining the pseudo - random (i.e., sparse or non - continuous) patterns between read requests to improve the effectiveness of the pre - fetching algorithm. ### Main Problem Description 1. **Limitations of Existing Prefetching Algorithms**: - Existing prefetching algorithms mainly focus on finding sequential patterns, but they are not effective in dealing with pseudo - random (sporadic) patterns. 2. **Performance Bottlenecks in Storage Systems**: - The overall performance of storage systems is affected by factors such as cache hit ratio, read latency, and input/output operations per second (IOPS). Existing prefetching algorithms perform poorly when dealing with sparse patterns, resulting in the under - utilization of the performance of storage systems. ### Goals of the Paper The main goal of the paper is to propose a new distance - based sparse prefetching algorithm (Distance Based Sporadic Prefetcher, DBSP), which can effectively mine the pseudo - random patterns between read requests, thereby improving the cache hit ratio and the overall performance of storage systems. ### Key Points of the Solution 1. **Introduction of the DBSP Algorithm**: - The DBSP algorithm calculates the association degree between different requests by analyzing the timestamps of historical read requests and predicts future read requests based on these association degrees. 2. **Optimization of Parameter Configuration**: - By optimizing the MSR trace dataset (traces collection), the DBSP algorithm outperforms the existing best algorithm Mithril by about 2% in cache hit ratio (CHR) and improves the precision by 3%. 3. **Detailed Evaluation Method**: - The paper also provides a detailed prefetching algorithm evaluation method to ensure the reproducibility and reliability of experimental results. This method not only considers the cache hit ratio but also introduces new indicators such as the storage activity ratio (SAR) to comprehensively evaluate the performance of the prefetching algorithm. ### Conclusion By introducing the DBSP algorithm, the paper successfully solves the shortcomings of existing prefetching algorithms in dealing with sparse patterns and significantly improves the cache hit ratio and overall performance of storage systems. In addition, the evaluation method proposed in the paper also provides a reliable benchmark for future research. ### Formula Summary - **Cache Hit Ratio (CHR)**: \[ \text{CHR}=\frac{n_{\text{ch}}}{n_{\text{ch}} + n_{\text{cm}}} \] where \(n_{\text{ch}}\) represents the number of cache hits and \(n_{\text{cm}}\) represents the number of cache misses. - **Precision**: \[ \text{Precision}=\frac{n_{\text{pr}}-n_{\text{eu}}}{n_{\text{pr}}} \] where \(n_{\text{pr}}\) represents the number of prefetched requests and \(n_{\text{eu}}\) represents the number of prefetched requests that are never used. - **Storage Activity Ratio (SAR)**: \[ \text{SAR}=\frac{n_{\text{dp}}}{n_{\text{dc}}} \] where \(n_{\text{dp}}\) represents the number of blocks downloaded from the storage system by the cache with a prefetcher, and \(n_{\text{dc}}\) represents the number of blocks downloaded from the storage system when only using the cache. Through these formulas, the paper comprehensively evaluates the performance of the DBSP algorithm and proves its effectiveness in dealing with sparse patterns.