Embedding-based Product Retrieval in Taobao Search

Sen Li,Fuyu Lv,Taiwei Jin,Guli Lin,Keping Yang,Xiaoyi Zeng,Xiao-Ming Wu,Qianli Ma
DOI: https://doi.org/10.48550/arXiv.2106.09297
2021-06-17
Abstract:Nowadays, the product search service of e-commerce platforms has become a vital shopping channel in people's life. The retrieval phase of products determines the search system's quality and gradually attracts researchers' attention. Retrieving the most relevant products from a large-scale corpus while preserving personalized user characteristics remains an open question. Recent approaches in this domain have mainly focused on embedding-based retrieval (EBR) systems. However, after a long period of practice on Taobao, we find that the performance of the EBR system is dramatically degraded due to its: (1) low relevance with a given query and (2) discrepancy between the training and inference phases. Therefore, we propose a novel and practical embedding-based product retrieval model, named Multi-Grained Deep Semantic Product Retrieval (MGDSPR). Specifically, we first identify the inconsistency between the training and inference stages, and then use the softmax cross-entropy loss as the training objective, which achieves better performance and faster convergence. Two efficient methods are further proposed to improve retrieval relevance, including smoothing noisy training data and generating relevance-improving hard negative samples without requiring extra knowledge and training procedures. We evaluate MGDSPR on Taobao Product Search with significant metrics gains observed in offline experiments and online A/B tests. MGDSPR has been successfully deployed to the existing multi-channel retrieval system in Taobao Search. We also introduce the online deployment scheme and share practical lessons of our retrieval system to contribute to the community.
Information Retrieval
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the retrieval relevance problem in e - commerce product search. Specifically, the paper points out that currently, the embedding - based retrieval (EBR) systems have two main problems in practical applications: 1. **Low Relevance**: The existing EBR systems have low relevance to the given query, resulting in unsatisfactory retrieval results. 2. **Inconsistency between Training and Inference Stages**: During the training stage, EBR systems are usually trained with random negative samples, while in the inference stage, it is required to select the top \(K\) products closest to the current query from all candidate products, which requires the model to have the ability of global comparison. However, the existing training objectives (such as hinge loss) can only perform local comparison, leading to inconsistent training and inference behaviors. To solve these problems, the authors propose a new model named Multi - Granularity Deep Semantic Product Retrieval (MGDSPR). The main contributions of this model include: 1. **Proposing the MGDSPR Model**: This model can dynamically capture the relationship between user query semantics and personalized behaviors and improve the relevance of retrieved products. 2. **Identifying the Inconsistency between Training and Inference Stages**: It is recommended to use softmax cross - entropy loss as the training objective to achieve better performance and faster convergence. 3. **Proposing Two Methods to Improve Retrieval Relevance**: - **Smoothing Noisy Training Data**: By introducing the temperature parameter \(\tau\) to smooth the noise in the training data, reducing the problem of low relevance caused by fully fitting user click records. - **Generating Relevance - Enhanced Hard Negative Samples**: Generate relevance - enhanced hard negative samples in the embedding space, enabling the model to better distinguish positive samples from their nearby samples. 4. **Experimental Verification**: Experiments were carried out on large - scale industrial datasets and Taobao online product search to verify the effectiveness of MGDSPR and analyze its impact on each stage of the search system. Through these improvements, the MGDSPR model aims to improve the accuracy and relevance of retrieval in e - commerce product search while maintaining the efficiency of the system.