Decoupled Vector Runahead for Prefetching Nested Memory-Access Chains

Ajeya Naithani,Jaime Roelandts,Sam Ainsworth,Timothy M. Jones,Lieven Eeckhout
DOI: https://doi.org/10.1109/mm.2024.3406891
IF: 2.8212
2024-08-29
IEEE Micro
Abstract:Decoupled vector runahead (DVR) exploits massive amounts of memory-level parallelism to improve the performance of applications that feature indirect memory accesses by dynamically inferring loop bounds at runtime, recognizing striding loads, and speculatively vectorizing the subsequent instructions that are part of an indirect chain. DVR runs as an on-demand, speculative, in-order, lightweight hardware subthread alongside the main thread within the core. DVR incurs minimal hardware overhead while delivering a substantial performance boost.
computer science, software engineering, hardware & architecture
What problem does this paper attempt to address?