Robust Interaction-Based Relevance Modeling for Online e-Commerce Search

Ben Chen,Huangyu Dai,Xiang Ma,Wen Jiang,Wei Ning
2024-09-25
Abstract:Semantic relevance calculation is crucial for e-commerce search engines, as it ensures that the items selected closely align with customer intent. Inadequate attention to this aspect can detrimentally affect user experience and engagement. Traditional text-matching techniques are prevalent but often fail to capture the nuances of search intent accurately, so neural networks now have become a preferred solution to processing such complex text matching. Existing methods predominantly employ representation-based architectures, which strike a balance between high traffic capacity and low latency. However, they exhibit significant shortcomings in generalization and robustness when compared to interaction-based architectures. In this work, we introduce a robust interaction-based modeling paradigm to address these shortcomings. It encompasses 1) a dynamic length representation scheme for expedited inference, 2) a professional terms recognition method to identify subjects and core attributes from complex sentence structures, and 3) a contrastive adversarial training protocol to bolster the model's robustness and matching capabilities. Extensive offline evaluations demonstrate the superior robustness and effectiveness of our approach, and online A/B testing confirms its ability to improve relevance in the same exposure position, resulting in more clicks and conversions. To the best of our knowledge, this method is the first interaction-based approach for large e-commerce search relevance calculation. Notably, we have deployed it for the entire search traffic on <a class="link-external link-http" href="http://alibaba.com" rel="external noopener nofollow">this http URL</a>, the largest B2B e-commerce platform in the world.
Information Retrieval,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges in semantic relevance calculation in e - commerce search engines, specifically including the following points: 1. **Query Intent and Keyword Clarity**: - Semantic Relevance Calculation (SRC) in e - commerce search needs to distinguish the ambiguous meanings in short user queries and match them with the most suitable products. For example, the query "new apple discount" may refer to the promotion of fresh fruits, the discount of Apple electronic products, or the latest offers of clothing brands. - Product descriptions often contain a large number of irrelevant keywords to gain more exposure, which dilutes the key information and affects search accuracy. 2. **Trade - off between Latency and Precision**: - To improve efficiency, e - commerce platforms shift from traditional keyword - based search algorithms to more advanced neural - network - based models. These neural models are divided into two categories: representation - based models and interaction - based models. - Representation - based models encode queries and products into compact embedding vectors through the Siamese network architecture, which are suitable for high - traffic online search, but simplified processing may lead to poor relevance prediction. - Interaction - based models perform excellently in capturing subtle semantic relationships, providing higher accuracy and discrimination, but due to high computational requirements, their application in real - time search scenarios is limited. 3. **Enhancing Robustness and Generalization Ability**: - Diverse language expressions in different cultural backgrounds complicate the accuracy of SRC models. For example, different expressions for the same discount: "50% off sale", "half price promotion", "discounted by half". - To reduce computational overhead, existing methods usually use pruning or distillation techniques to simplify models, but they perform poorly when dealing with unfamiliar datasets, exposing the limitations of robustness and generalization ability. In response to these problems, the paper proposes a robust interaction - based relevance modeling method. Specific innovations include: - **Dynamic - length Representation Scheme**: Intelligently adjust the length of input tokens to adapt to queries and product descriptions of different lengths, optimizing computational resources. - **Technical Term Recognition Strategy**: Enhance the model vocabulary, use Named Entity Recognition (NER) to highlight topics and core attributes, and strengthen the representation of technical terms. - **Contrastive Adversarial Training Mechanism (CAT)**: Improve the robustness and matching ability of the model by simultaneously optimizing input and output embedding representations. Through these improvements, this method has achieved significant performance improvement in large - scale e - commerce search engines. In particular, after successful deployment on Alibaba, the world's largest B2B e - commerce platform, it has significantly increased click - through rates and conversion rates.