A dedicated adaptive loop pre-fetch mechanism for stream-like application

Xiaoping Huang,Xiaoya Fan,YuHui Chen,Xiangdong He
DOI: https://doi.org/10.1109/ICSICT.2010.5667325
2010-01-01
Abstract:For the stream-like applications with high-bandwidth and low latency, optimizing the memory latency can effectively improve the QoS. In this paper, we propose a dedicated adaptive loop pre-fetch mechanism to reduce the memory latency and also improve the pre-fetch accuracy. In the mechanism, when a loop sequences is detected, the stream pre-fetch engine can adaptively initiate the pre-fetch operation and store the return data into the on-chip stream buffers. The pre-fetch engine consists of loop sequences recognition, stream buffer FIFOs, address calculation ALU. A hardware engine is implemented and integrated into a processor to verify the mechanism. When the processor with the pre-fetch engine is running a regular loop sequences, it can save 2/3 to 1/2 of the time spent on memory latency. Also the mechanism can alleviate the cache pollution and the cache thrash. ©2010 IEEE.
What problem does this paper attempt to address?