Abstract:We consider the problem of retrieving and ranking items in an eCommerce catalog, often called SKUs, in order of relevance to a user-issued query. The input data for the ranking are the texts of the queries and textual fields of the SKUs indexed in the catalog. We review the ways in which this problem both resembles and differs from the problems of IR in the context of web search. The differences between the product-search problem and the IR problem of web search necessitate a different approach in terms of both models and datasets. We first review the recent state-of-the-art models for web search IR, distinguishing between two distinct types of model which we call the distributed type and the local-interaction type. The different types of relevance models developed for IR have complementary advantages and disadvantages when applied to eCommerce product search. Further, we explain why the conventional methods for dataset construction employed in the IR literature fail to produce data which suffices for training or evaluation of models for eCommerce product search. We explain how our own approach, applying task modeling techniques to the click-through logs of an eCommerce site, enables the construction of a large-scale dataset for training and robust benchmarking of relevance models. Our experiments consist of applying several of the models from the IR literature to our own dataset. Empirically, we have established that, when applied to our dataset, certain models of local-interaction type reduce ranking errors by one-third compared to the baseline tf-idf. Applied to our dataset, the distributed models fail to outperform the baseline. As a basis for a deployed system, the distributed models have several advantages, computationally, over the local-interaction models. This motivates an ongoing program of work, which we outline at the conclusion of the paper.

Text-Based Product Matching -- Semi-Supervised Clustering Approach

A Clustering-Based Combinatorial Approach to Unsupervised Matching of Product Titles

A Comparison of Supervised Learning to Match Methods for Product Search

End-to-end multi-modal product matching in fashion e-commerce

Optimizing Product Matching in E-Commerce with DOC2VEC: Leveraging Hierarchical Clustering Parameters Based on Product Titles

Semantic Product Search for Matching Structured Product Catalogs in E-Commerce

ISCAS_ICIP at MWPD-2020 Task 1: Product Matching Based on Deep Entity Matching Frameworks

Machine Learning Based Product Classification for eCommerce

ITEm: Unsupervised Image-Text Embedding Learning for eCommerce

Unsupervised cross-lingual matching of product classifications

Matryoshka Peek: Toward Learning Fine-Grained, Robust, Discriminative Features for Product Search

Introducing a novel dataset for product matching: A new challenge for matching systems

A Flat-Hierarchical Approach Based on Machine Learning Model for e-Commerce Product Classification

Epems: An Entity Matching System For E-Commerce Products

Text Classification for Predicting Multi-level Product Categories

Multimodal Representation Learning-Based Product Matching.

Online Similarity Learning with Feedback for Invoice Line Item Matching

A Deep Forest Method for Classifying E-Commerce Products by Using Title Information

Group based Self Training for E-Commerce Product Record Linkage.

Identifying Substitute and Complementary Products for Assortment Optimization with Cleora Embeddings

End-to-End Neural Ranking for eCommerce Product Search: an application of task models and textual embeddings