A Flexible Embedding-Aware Near Memory Processing Architecture for Recommendation System

Lingfei Lu,Yudi Qiu,Shiyan Yi,Yibo Fan
DOI: https://doi.org/10.1109/lca.2023.3305668
IF: 2.3
2023-01-01
IEEE Computer Architecture Letters
Abstract:Personalized recommendation system (RS) is widely used in the industrial community and occupies much time in AI computing centers. A critical component of RS is the embedding layer, which consists of sparse embedding lookups and is memory-bounded. Recent works have proposed near-memory processing (NMP) architectures to utilize high inner-memory bandwidth to speed up embedding lookups. These NMP works divide embedding vectors either horizontally or vertically. However, the effectiveness of horizontal or vertical partitioning is hard to guarantee under different memory configurations or embedding vector sizes. To improve this issue, we propose FeaNMP, a f lexible e mbedding- a ware NMP architecture that accelerates the inference phase of RS. We explore different partitioning strategies in detail and design a flexible way to select optimal ones depending on different embedding dimensions and DDR configurations. As a result, compared to the state-of-the-art rank-level NMP work RecNMP, our work achieves up to 11.1× speedup for embedding layers under mix-dimensioned workloads.
What problem does this paper attempt to address?