Stream-Based Data Placement for Near-Data Processing with Extended Memory

Yiwei Li,Boyu Tian,Yi Ren,Mingyu Gao
DOI: https://doi.org/10.1109/micro61859.2024.00120
2024-01-01
Abstract:The data access bottleneck in memory-intensive applications has motivated various architectural innovations in the main memory system, with Near-Data Processing (NDP) and Compute Express Link (CXL) as two recent prominent representations. In this work, we focus on addressing the memory capacity limitation of 3D-stacked NDP systems using CXL-based extended memory, where the DRAM space of the 3D NDP stacks is used as the cache of the CXL-based memory. Nevertheless, this architecture exhibits unique challenges to address the significant interconnect latency and expensive metadata management problems. We propose NDPExt, a hardware-software co-design approach to achieve efficient NDP with extended memory. On the hardware side, NDPExt uses coarse-grained data streams rather than conventional fine-grained cachelines to manage the NDP stacks as a distributed DRAM cache, in order to reduce metadata cost and apply custom optimizations to different data. On the software side, NDPExt periodically derives the optimized cache configuration to allocate the DRAM cache space to each stream based on profiled miss behaviors. The configuration cooptimizes capacity sizing, spatial placement, and data replication. Combining the two techniques allows NDPExt to achieve 1.41× on average and up to 2.43× performance improvements over state-of-the-art cache management solutions.
What problem does this paper attempt to address?