Polaris: Enhancing CXL-based Memory Expanders with Memory-side Prefetching.
Zhe Zhou,Shuotao Xu,Yiqi Chen,Tao Zhang,Ran Shu ,Lei Qu,Peng Cheng,Yongqiang Xiong,Guangyu Sun
DOI: https://doi.org/10.1007/978-981-99-7872-4_2
2023-01-01
Abstract:The use of CXL-based memory expanders introduces increased latency compared to local memory due to control and transmission overheads. This latency difference negatively impacts tasks that are sensitive to latency. While cache prefetching has traditionally been used to mitigate memory latency, addressing this performance gap requires improved CPU prefetch coverage. However, tuning a CPU prefetcher for CXL memory necessitates costly CPU modifications and can result in cache pollution and wasted memory bandwidth. To address these challenges, we propose a solution called Polaris, a novel CXL memory expander that integrates a hardware prefetcher in the CXL memory controller chip. Polaris analyzes incoming memory requests and prefetches cachelines to a dedicated SRAM buffer without requiring modifications to CPUs or software. In cases where prefetch hits occur, Polaris establishes a “shortcut” for rapid memory access, significantly reducing the performance gap between CXL and local DDR memory. Furthermore, if small CPU changes are allowed, such as extending Intel’s DDIO, Polaris can further minimize CXL memory access overheads by actively pushing high-confidence prefetches to the CPU’s last-level cache (LLC). Extensive experiments demonstrate that, in conjunction with various CPU-side prefetchers, Polaris enables up to 85% of common workloads (on average, 43%) to effectively tolerate CXL memory’s longer latency.