Context-Aware Ranking by Constructing a Virtual Environment for Reinforcement Learning

Junqi Zhang,Jiaxin Mao,Yiqun Liu,Ruizhe Zhang,Min Zhang,Shaoping Ma,Jun Xu,Qi Tian
DOI: https://doi.org/10.1145/3357384.3357945
2019-01-01
Abstract:Result ranking is one of the major concerns for Web search technologies. Most existing methodologies rank search results in descending order according to pointwise relevance estimation of single results. However, the dependency relationship between different search results are not taken into account. While search engine result pages contain more and more heterogenous components, a better ranking strategy should be a context-aware process and optimize result ranking globally. In this paper, we propose a novel framework which aims to improve context-aware listwise ranking performance by optimizing online evaluation metrics. The ranking problem is formalized as a Markov Decision Process (MDP) and solved with the reinforcement learning paradigm. To avoid the great cost to online systems during the training of the ranking model, we construct a virtual environment with millions of historical click logs to simulate the behavior of real users. Extensive experiments on both simulated and real datasets show that: 1) constructing a virtual environment can effectively leverage the large scale click logs and capture some important properties of real users. 2) the proposed framework can improve search ranking performance by a large margin.
What problem does this paper attempt to address?