UniEmbedding: Learning Universal Multi-Modal Multi-Domain Item Embeddings Via User-View Contrastive Learning

Boqi Dai,Zhaocheng Du,Jieming Zhu,Jintao Xu,Deqing Zou,Quanyu Dai,Zhenhua Dong,Rui Zhang,Hai-Tao Zheng
DOI: https://doi.org/10.1145/3627673.3680098
2024-01-01
Abstract:Learning high-quality item embeddings is crucial for recommendation tasks such as matching and ranking. However, existing methods often rely on ID-based item embeddings learned end-to-end with downstream recommendation models, which may suffer from overfitting and limited generalizability. In this paper, we aim to learn universal item embeddings (dubbed UniEmbedding) that capture multi-modal semantics, generalize across multiple domains, and serve different downstream tasks. To achieve this goal, we introduce the UniEmbedding pretraining framework, which includes three modules: a domain-aware multi-modal adapter, a user-view projection module, and contrastive learning objectives across domains. Compared to naive ID embeddings, UniEmbedding provides rich semantic information that generalizes more effectively across domains. Unlike multi-modal embeddings directly extracted from off-the-shelf pretrained models, UniEmbedding achieves better alignment between content semantics and behaviors. We evaluated UniEmbedding on both public and industrial datasets, demonstrating its effectiveness in matching and ranking tasks. Furthermore, UniEmbedding has been deployed in multiple recommendation applications at Huawei, resulting in significant gains in user engagement metrics.
What problem does this paper attempt to address?