Cross-Lingual Product Retrieval in E-Commerce Search
Wenya Zhu,Xiaoyu Lv,Baosong Yang,Yinghua Zhang,Xu Yong,Linlong Xu,Yinfu Feng,Haibo Zhang,Qing Da,Anxiang Zeng,Ronghua Chen
DOI: https://doi.org/10.1007/978-3-031-05936-0_36
2022-01-01
Abstract:Cross-lingual product retrieval (CLPR) recalls semantically relevant products that match multilingual search queries. It plays a crucial role in E-commerce sites to serve cross-border customers. However, there exists no public large-scale dataset on CLPR, hindering the research on this topic. We present CLPR-9M (https://tianchi.aliyun.com/dataset/dataDetail?dataId=121505), the first large-scale CLPR dataset containing 9 million query-product pairs, covering 10 major commodity categories and 3 language pairs, mined from real-world user logs. We also release a test dataset, annotated by bilingual experts with fine-grained labels. We build our baselines upon the widely used cross-lingual embedding retrieval framework and improve it from a range of aspects, including the pretrain-finetune paradigm, negative sampling, as well as optimization objective. Benchmarks are assessed and reported using multiple evaluation metrics, and will be beneficial for future research in this area.