Abstract:On-device session-based recommendation systems have been achieving increasing attention on account of the low energy/resource consumption and privacy protection while providing promising recommendation performance. To fit the powerful neural session-based recommendation models in resource-constrained mobile devices, tensor-train decomposition and its variants have been widely applied to reduce memory footprint by decomposing the embedding table into smaller tensors, showing great potential in compressing recommendation models. However, these model compression techniques significantly increase the local inference time due to the complex process of generating index lists and a series of tensor multiplications to form item embeddings, and the resultant on-device recommender fails to provide real-time response and recommendation. To improve the online recommendation efficiency, we propose to learn compositional encoding-based compact item representations. Specifically, each item is represented by a compositional code that consists of several codewords, and we learn embedding vectors to represent each codeword instead of each item. Then the composition of the codeword embedding vectors from different embedding matrices (i.e., codebooks) forms the item embedding. Since the size of codebooks can be extremely small, the recommender model is thus able to fit in resource-constrained devices and meanwhile can save the codebooks for fast local <a class="link-external link-http" href="http://inference.Besides" rel="external noopener nofollow">this http URL</a>, to prevent the loss of model capacity caused by compression, we propose a bidirectional self-supervised knowledge distillation framework. Extensive experimental results on two benchmark datasets demonstrate that compared with existing methods, the proposed on-device recommender not only achieves an 8x inference speedup with a large compression ratio but also shows superior recommendation performance.

AutoDPQ: Automated Differentiable Product Quantization for Embedding Compression

Differentiable Optimized Product Quantization and Beyond

Beyond Product Quantization: Deep Progressive Quantization for Image Retrieval.

Embedding Compression in Recommender Systems: A Survey

Discrete Social Recommendation.

DQRM: Deep Quantized Recommendation Models

DPQ: dynamic pseudo-mean mixed-precision quantization for pruned neural network

Mixed-Precision Embeddings for Large-Scale Recommendation Models

Progressive Similarity Preservation Learning for Deep Scalable Product Quantization

Efficient On-Device Session-Based Recommendation

Memory-efficient Embedding for Recommendations

CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models

Shared Predictive Cross-Modal Deep Quantization.

Entropy-based Deep Product Quantization for Visual Search and Deep Feature Compression

AutoDim: Field-aware Embedding Dimension Searchin Recommender Systems

Deep Product Quantization Module for Efficient Image Retrieval

A Generic Network Compression Framework for Sequential Recommender Systems

Discrete Factorization Machines for Fast Feature-based Recommendation

Product Quantized Collaborative Filtering

Structured Dynamic Precision for Deep Neural Networks Quantization