A Unified View of Loss Functions in Learning to Rank
Tie-Yan Liu Wei Chen,Yanyan Lan
Abstract:This paper provides a unified view of loss functions used in learning to rank. Loss function is a key component in learning to rank, because it encodes human knowledge on evaluation of ranking and guides the process of learning. Many loss functions have been proposed in the literature of learning to rank, with different forms and different motivations, and have been exploited in the development of various algorithms. However, there are several questions not well answered yet regarding the loss functions: (i) what is the relationship among them? (ii) can the optimization of them lead to the maximization of IR measures such as NDCG? (iii) how can we modify them to improve the performance of learning algorithms? In this paper, we try to answer these questions by proposing a quantity named ‘the unified loss for ranking’. The unified loss is defined by modeling ranking as a sequence of classification tasks, and assigning different weights to the tasks to reflect their importance in the overall ranking problem. With regards to the unified loss, we obtain the following three theoretical results. First, the loss functions in most existing algorithms, such as Ranking SVM, RankNet, RankBoost, and ListMLE, are margin-based surrogate losses (and also upper bounds) of the unified loss with uniform weight. Second, the unified loss is an upper bound of one minus NDCG, when the weights are set according to the position discount function and the gain function in NDCG. Third, if we can modify the loss functions of existing algorithms by introducing the weights in the unified loss, they are still upper bounds of the corresponding unified loss, and thus upper bounds of one minus NDCG. According to the results, we can expect that the minimization of the modified loss functions will lead to the effective maximization of NDCG. We have performed experiments on benchmark datasets to validate our theoretical findings. The results show that the ranking performances of the models trained with the modified loss functions can be much better than those of the models trained with the original loss functions.