Adversarially Trained Environment Models are Effective Policy Evaluators and Improvers An Application to Information Retrieval
Yao Li,Yifan Liu,Xinyi Dai,Jianghao Lin,Hang Lai,Yunfei Liu,Yong Yu
DOI: https://doi.org/10.1145/3627676.3627680
2023-01-01
Abstract:The essence of information retrieval (IR) is to find the most useful information items (or documents) according to the user's information need and present the items to the users in the form of a ranking list. The widely used evaluation metrics for a ranking list are NDCG, MAP, hit ratio etc., which are based on strong assumptions on the users' examining and click behaviors when interacting with the ranking list. In modern IR scenarios, it has been shown that users' behavior can be highly personalized, diverse and dynamic, which leads to the failure of those assumptions. Click models (CMs) are proposed to learn such complex user behaviors. However, most of existing works on CM still focus on fitting the logged behavior data while little attention has been paid to how CMs can both evaluate and improve rankers effectively. In this paper, we perform an in-depth investigation into how a CM could simultaneously evaluate and improve the ranking policy for IR. Specifically, we first make a theoretical analysis on the discrepancy of evaluated performance between a learned click model and the real user. Then based on the analysis, we discuss the principles of learning a good CM that could reduce such a discrepancy and accordingly propose a novel rank-oriented click model (RankCM). Furthermore, we conducted extensive experiments on how a CM can both evaluate and improve a ranker effectively based on four tasks, namely data fitting, click simulation, policy ranking and policy improvement, where RankCM demonstrates its comprehensive effectiveness and superiority over the compared CMs. Our study claims that adversarially trained environment models are effective policy evaluators and improvers, and this work could serve as an application to information retrieval. The innovations from the experiments as well as the future promising investigations are further discussed.