Automatic Search Engine Performance Evaluation With The Wisdom Of Crowds

Rongwei Cen,Yiqun Liu,Min Zhang,Liyun Ru,Shaoping Ma
DOI: https://doi.org/10.1007/978-3-642-04769-5_31
2009-01-01
Abstract:Relevance evaluation is an important topic in Web search engine research. Traditional evaluation methods resort to huge amount of human efforts which lead to an extremely time-consuming process in practice. With analysis on large scale user query logs and click-through data, we propose a performance evaluation method that fully automatically generates large scale Web search topics and answer sets under Cranfield framework. These query-to-answer pairs are directly utilized in relevance evaluation with several widely-adopted precision/recall-related retrieval performance metrics. Besides single search engine log analysis, we propose user behavior models on multiple search engines' click-through logs to reduce potential bias among different search engines. Experimental results show that the evaluation results are similar to those gained by traditional human annotation, and our method avoids the propensity and subjectivity of manual judgments by experts in traditional ways.
What problem does this paper attempt to address?