Enhancing Relevance of Embedding-based Retrieval at Walmart

Juexin Lin,Sachin Yadav,Feng Liu,Nicholas Rossi,Praveen R. Suram,Satya Chembolu,Prijith Chandran,Hrushikesh Mohapatra,Tony Lee,Alessandro Magnani,Ciya Liao
DOI: https://doi.org/10.1145/3627673.3680047
2024-08-15
Abstract:Embedding-based neural retrieval (EBR) is an effective search retrieval method in product search for tackling the vocabulary gap between customer search queries and products. The initial launch of our EBR system at Walmart yielded significant gains in relevance and add-to-cart rates [1]. However, despite EBR generally retrieving more relevant products for reranking, we have observed numerous instances of relevance degradation. Enhancing retrieval performance is crucial, as it directly influences product reranking and affects the customer shopping experience. Factors contributing to these degradations include false positives/negatives in the training data and the inability to handle query misspellings. To address these issues, we present several approaches to further strengthen the capabilities of our EBR model in terms of retrieval relevance. We introduce a Relevance Reward Model (RRM) based on human relevance feedback. We utilize RRM to remove noise from the training data and distill it into our EBR model through a multi-objective loss. In addition, we present the techniques to increase the performance of our EBR model, such as typo-aware training, and semi-positive generation. The effectiveness of our EBR is demonstrated through offline relevance evaluation, online AB tests, and successful deployments to live production. [1] Alessandro Magnani, Feng Liu, Suthee Chaidaroon, Sachin Yadav, Praveen Reddy Suram, Ajit Puthenputhussery, Sijie Chen, Min Xie, Anirudh Kashi, Tony Lee, et al. 2022. Semantic retrieval at walmart. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3495-3503.
Information Retrieval
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the relevance problems in the Embedding - based Retrieval (EBR) system on the Walmart e - commerce platform. Although the EBR system significantly improved the relevance of search results and the add - to - cart rate during the initial deployment, there are still multiple relevance degradation problems in practical applications. Specifically, these problems include: 1. **False positives / false negatives in training data**: The training data extracted from user interaction logs may contain irrelevant positive samples or missed negative samples. 2. **Inability to handle misspellings**: Misspellings are common in actual search traffic, and the existing models do not handle them well. To solve these problems, the authors propose a series of methods to enhance the retrieval relevance of the EBR model, mainly including: - **Relevance Reward Model (RRM)**: A cross - encoder model trained based on human relevance feedback, used to evaluate the relevance between queries and products, and integrated into the EBR model through a multi - objective loss function. - **Label correction**: Use RRM to correct labels in the training data to reduce the impact of false positives. - **Semi - positive sample generation**: Generate semi - positive samples related to the query from the lower positions of the retrieval results during the offline negative sample generation process. - **Spelling - aware training**: Introduce misspellings during the training process to improve the robustness of the model to misspellings. The purpose of these methods is to improve the relevance of retrieval results, thereby improving the user's shopping experience. The effectiveness of these methods has been verified through offline relevance evaluation, online A/B testing, and successful deployment in the production environment.