Individual Judgments Versus Consensus: Estimating Query-URL Relevance.

Hengjie Song,Yonghui Xu,Huaqing Min,Qingyao Wu,Wei Wei,Jianshu Weng,Xiaogang Han,Qiang Yang,Jialiang Shi,Jiaqian Gu,Chunyan Miao,Nishida Toyoaki
DOI: https://doi.org/10.1145/2834122
2016-01-01
Abstract:Query-URL relevance, measuring the relevance of each retrieved URL with respect to a given query, is one of the fundamental criteria to evaluate the performance of commercial search engines. The traditional way to collect reliable and accurate query-URL relevance requires multiple annotators to provide their individual judgments based on their subjective expertise (e.g., understanding of user intents). In this case, the annotators’ subjectivity reflected in each annotator individual judgment (AIJ) inevitably affects the quality of the ground truth relevance (GTR). But to the best of our knowledge, the potential impact of AIJs on estimating GTRs has not been studied and exploited quantitatively by existing work. This article first studies how multiple AIJs and GTRs are correlated. Our empirical studies find that the multiple AIJs possibly provide more cues to improve the accuracy of estimating GTRs. Inspired by this finding, we then propose a novel approach to integrating the multiple AIJs with the features characterizing query-URL pairs for estimating GTRs more accurately. Furthermore, we conduct experiments in a commercial search engine—Baidu.com—and report significant gains in terms of the normalized discounted cumulative gains.
What problem does this paper attempt to address?