Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-Commerce

Zhe Lin,Jiwei Tan,Dan Ou,Xi Chen,Shaowei Yao,Bo Zheng

DOI: https://doi.org/10.1145/3637528.3671559

2024-07-13

Abstract:Text relevance or text matching of query and product is an essential technique for the e-commerce search system to ensure that the displayed products can match the intent of the query. Many studies focus on improving the performance of the relevance model in search system. Recently, pre-trained language models like BERT have achieved promising performance on the text relevance task. While these models perform well on the offline test dataset, there are still obstacles to deploy the pre-trained language model to the online system as their high latency. The two-tower model is extensively employed in industrial scenarios, owing to its ability to harmonize performance with computational efficiency. Regrettably, such models present an opaque ``black box'' nature, which prevents developers from making special optimizations. In this paper, we raise deep Bag-of-Words (DeepBoW) model, an efficient and interpretable relevance architecture for Chinese e-commerce. Our approach proposes to encode the query and the product into the sparse BoW representation, which is a set of word-weight pairs. The weight means the important or the relevant score between the corresponding word and the raw text. The relevance score is measured by the accumulation of the matched word between the sparse BoW representation of the query and the product. Compared to popular dense distributed representation that usually suffers from the drawback of black-box, the most advantage of the proposed representation model is highly explainable and interventionable, which is a superior advantage to the deployment and operation of online search engines. Moreover, the online efficiency of the proposed model is even better than the most efficient inner product form of dense representation ...

Information Retrieval,Artificial Intelligence,Computation and Language

What problem does this paper attempt to address?

The paper aims to address the issue of text relevance or text matching in e-commerce search systems, ensuring that the displayed products match the user's query intent. Although pre-trained language models (such as BERT) perform well on offline test datasets, their high latency poses a barrier to deployment in online systems. The dual-tower model is widely used in industrial scenarios due to its balance between performance and computational efficiency, but this model presents an opaque "black box" nature, hindering developers from making special optimizations. Therefore, the paper proposes the Deep Bag-of-Words model (DeepBoW), an efficient and interpretable relevance architecture for Chinese e-commerce. This method encodes queries and products as sparse bag-of-words representations, i.e., a set of word-weight pairs. The relevance score is measured by accumulating the weights of matching words between the query and the product. Compared to popular dense distributed representations, this model is highly interpretable and intervenable, which has significant advantages for the deployment and operation of online search engines. Additionally, the online efficiency of this model even surpasses the most efficient dense representations. Experimental results show that the proposed DeepBoW model achieves an AUC improvement of over 2.1% on three different datasets and has been deployed on Taobao, the largest Chinese e-commerce search engine, serving the entire search traffic for over 6 months. This indicates that the model not only has competitive performance but is also highly efficient in practical applications.

Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-Commerce

Robust Interaction-Based Relevance Modeling for Online e-Commerce Search

Improving Text Matching in E-Commerce Search with A Rationalizable, Intervenable and Fast Entity-Based Relevance Model

Weakly Supervised Co-Training of Query Rewriting Andsemantic Matching for E-Commerce

Learning a Product Relevance Model from Click-Through Data in E-Commerce

SPM: Structured Pretraining and Matching Architectures for Relevance Modeling in Meituan Search

Large Language Model based Long-tail Query Rewriting in Taobao Search

Knowledge Distillation based Contextual Relevance Matching for E-commerce Product Search

Enhancing Relevance of Embedding-based Retrieval at Walmart

Unified Vision-Language Representation Modeling for E-Commerce Same-Style Products Retrieval

Towards More Relevant Product Search Ranking Via Large Language Models: An Empirical Study

From Semantic Retrieval to Pairwise Ranking: Applying Deep Learning in E-commerce Search

Explainable LLM-driven Multi-dimensional Distillation for E-Commerce Relevance Learning

Generating vocabulary for global feature representation towards commerce image retrieval

Cross-domain Attention Network with Wasserstein Regularizers for E-commerce Search

End-to-End Neural Ranking for eCommerce Product Search: an application of task models and textual embeddings

Embedding-based Product Retrieval in Taobao Search

BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce Search

Query Rewriting via Cycle-Consistent Translation for E-Commerce Search

A Dynamic Product-aware Learning Model for E-commerce Query Intent Understanding

An Interpretable Ensemble of Graph and Language Models for Improving Search Relevance in E-Commerce