Bridge-NDP: Achieving Efficient Communication-Computation Overlap in Near Data Processing with Bridge Architecture

Liyan Chen,Jianfei Jiang,Qin Wang,Zhigang Mao,Naifeng Jing
DOI: https://doi.org/10.1109/asp-dac58780.2024.10473860
2024-01-01
Abstract:Near data accelerators (NDAs) enable near data processing (NDP) within main memory that benefits performance by providing more aggregated bandwidth and reducing long-distance data transfer. Most prior works focus on reaping higher internal bandwidth to improve performance of the NDA itself. However, the overhead of interactive communication between host and NDAs is overlooked, which has become the bottleneck of NDP systems. In this paper, we propose bridge-NDP, a novel NDP architecture that exploits existing memory buses serving as bridge buses to fully utilize bandwidth. With bridge access enabled by optimized bridge commands, bridge-NDP efficiently overlaps communication and computation. It can be applied to existing NDP systems regardless of the memory level NDAs are attached to. For a variety of key computing kernels from machine learning, data analytics, etc., our evaluation shows that bridge-NDP speeds up not only the NDA performance itself (1.13×-3.62×), but also the host-NDA collaboration performance (2.43×-4.21×), achieving more bandwidth utilization (1.12×-3.67× and 1.48×-4.13×) over the state-of-the-art NDP solution.
What problem does this paper attempt to address?