Incorporating Density in Active Learning with Application to Ranking

Wenbin Cai,Ya Zhang
2012-01-01
Abstract:Active learning aims to achieve high performance using as few labeled training set as possible, thereby minimizing the cost of data labeling. In Web search ranking applications, learning to rank is an important task which is to automatically build a ranking function through supervised learning. Like many supervised learning tasks, a large amount of labeled training data is required to train a high quality ranking function. Meanwhile, in many real-world learning-to-rank applications, data labeling is usually very expensive and time-consuming. To reduce the labeling cost, there have been many studies on applying active learning to ranking, which aim to select the most informative example for labeling manually. However, existing works certainly ignore the information about prior data density which can be useful for active learning. In this paper, we use the classical Kernel Density Estimation (KDE) method to infer information about data density. Then, under the Generalization Error Reduction (GER) framework, we propose a novel active learning strategy to select the most informative example that minimizes the generalization error. The proposed strategy is applied at the query level, the document level, and further at query-document level with a two-stage active learning algorithm. Experimental results on a real-world Web search ranking dataset have demonstrated the effectiveness of the proposed active learning algorithms.
What problem does this paper attempt to address?