Generating True Relevance Labels in Chinese Search Engine Using Clickthrough Data

Hengjie Song,Chunyan Miao,Zhiqi Shen
DOI: https://doi.org/10.1609/aaai.v25i1.8088
2011-01-01
Proceedings of the AAAI Conference on Artificial Intelligence
Abstract:In current search engines, ranking functions are learned from a large number of labeled pairs in which the labels are assigned by human judges, describing how well the URLs match the different queries. However in commercial search engines, collecting high quality labels is time-consuming and labor-intensive. To tackle this issue, this paper studies how to produce the true relevance labels for pairs using clickthrough data. By analyzing the correlations between query frequency, true relevance labels and users’ behaviors, we demonstrate that the users who search the queries with similar frequency have similar search intents and behavioral characteristics. Based on such properties, we propose an efficient discriminative parameter estimation in a multiple instance learning algorithm (MIL) to automatically produce true relevance labels for pairs. Furthermore, we test our approach using a set of real world data extracted from a Chinese commercial search engine. Experimental results not only validate the effectiveness of the proposed approach, but also indicate that our approach is more likely to agree with the aggregation of the multiple judgments when strong disagreements exist in the panel of judges. In the event that the panel of judges is consensus, our approach provides more accurate automatic label results. In contrast with other models, our approach effectively improves the correlation between automatic labels and manual labels.
What problem does this paper attempt to address?