Abstract:Latent factor models are the dominant backbones of contemporary recommender systems (RSs) given their performance advantages, where a unique vector embedding with a fixed dimensionality (e.g., 128) is required to represent each entity (commonly a user/item). Due to the large number of users and items on e-commerce sites, the embedding table is arguably the least memory-efficient component of RSs. For any lightweight recommender that aims to efficiently scale with the growing size of users/items or to remain applicable in resource-constrained settings, existing solutions either reduce the number of embeddings needed via hashing, or sparsify the full embedding table to switch off selected embedding dimensions. However, as hash collision arises or embeddings become overly sparse, especially when adapting to a tighter memory budget, those lightweight recommenders inevitably have to compromise their accuracy. To this end, we propose a novel compact embedding framework for RSs, namely Compositional Embedding with Regularized Pruning (CERP). Specifically, CERP represents each entity by combining a pair of embeddings from two independent, substantially smaller meta-embedding tables, which are then jointly pruned via a learnable element-wise threshold. In addition, we innovatively design a regularized pruning mechanism in CERP, such that the two sparsified meta-embedding tables are encouraged to encode information that is mutually complementary. Given the compatibility with agnostic latent factor models, we pair CERP with two popular recommendation models for extensive experiments, where results on two real-world datasets under different memory budgets demonstrate its superiority against state-of-the-art baselines. The codebase of CERP is available in <a class="link-external link-https" href="https://github.com/xurong-liang/CERP" rel="external noopener nofollow">this https URL</a>.

A Flexible Embedding-Aware Near Memory Processing Architecture for Recommendation System

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

ESPN: Memory-Efficient Multi-Vector Information Retrieval

NDRec: A Near-Data Processing System for Training Large-Scale Recommendation Models

iMARS: An In-Memory-Computing Architecture for Recommendation Systems

MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions

Optimizing Inference Quality with SmartNIC for Recommendation System

Near-Memory Processing in Action: Accelerating Personalized Recommendation with AxDIMM.

Disaggregating Embedding Recommendation Systems with FlexEMR

Mem-Rec: Memory Efficient Recommendation System using Alternative Representation

CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models

CmnRec: Sequential Recommendations with Chunk-accelerated Memory Network

UpDLRM: Accelerating Personalized Recommendation using Real-World PIM Architecture

Rerec: In-ReRAM Acceleration with Access-Aware Mapping for Personalized Recommendation

Mixed-Precision Embedding Using a Cache

Learning Compact Compositional Embeddings via Regularized Pruning for Recommendation

DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference

RNE: A Scalable Network Embedding for Billion-scale Recommendation

Position-aware Compositional Embeddings for Compressed Recommendation Systems

Sequential Recommendation with User Memory Networks

Learnable Embedding Sizes for Recommender Systems.