An Efficient Near-Bank Processing Architecture for Personalized Recommendation System

Yuqing Yang,Weidong Yang,Qin Wang,Naifeng Jing,Jianfei Jiang,Zhigang Mao,Weiguang Sheng
DOI: https://doi.org/10.1145/3566097.3567857
2023-01-01
Abstract:Personalized recommendation systems consume the major resources in modern AI data centers. The memory-bound embedding layers with irregular memory access patterns have been identified as the bottleneck of recommendation systems. To overcome the memory challenges, near-memory processing (NMP) would be an effective solution which provides high bandwidth. Recent work proposes an NMP approach to accelerate the recommendation models by utilizing the through-silicon via (TSV) bandwidth in 3D-stacked DRAMs. However, the total bandwidth provided by TSVs is insufficient for a batch of embedding layers processed in parallel. In this paper, we propose a near-bank processing architecture to accelerate recommendation models. By integrating the compute-logic near memory banks on DRAM dies of the 3D-stacked DRAM, our architecture can exploit the enormous bank-level bandwidth which is much higher than TSV bandwidth. We also present a hardware/software interface for embedding layers offloading. Moreover, we propose an efficient mapping scheme to enhance the utilization of bank-level bandwidth. As a result, our architecture achieves up to 2.10X speedup and 31% energy saving for data movement over the state-of-the-art NMP solution for recommendation acceleration based on 3D-stacked memory.
What problem does this paper attempt to address?