LT2R: Learning to Online Learning to Rank for Web Search

Xiaokai Chu,Changying Hao,Shuaiqiang Wang,Dawei Yin,Jiashu Zhao,Lixin Zou,Chenliang Li
DOI: https://doi.org/10.1109/icde60146.2024.00360
2024-01-01
Abstract:Online learning to rank (OLTR), which directly optimizes the ranker with interactive user feedback, has gained considerable attention in both academia and industry. However, most current approaches suffer from the inefficiency of heuristic exploration strategies, which can seriously hurt users' experience. Furthermore, the existing OLTR solutions fail to learn from the cost-effective logged data, blocking their usage in the real industrial system. To handle the above issues, we in this paper introduce a new OLTR framework LT 2 R, namely Learning To online Learning to Rank. LT 2 R aims to study an efficient parameterized exploration strategy, by which a ranker could converge to the optimal ranking with as few exploration steps as possible. Specifically, we formulate the OLTR task as a typical Markov Decision Process and introduce an online reinforcement learning algorithm with a multi-round cumulative reward to guarantee fast convergence. Moreover, we contribute an offline learning algorithm for LT 2 R to exploit the knowledge from the historical searching logs, which can provide a fair warm-up model for its industrial deployment. Extensive experiments on both benchmark datasets and Baidu search engine have demonstrated its superiority over state-of-the-art methods.
What problem does this paper attempt to address?