Evaluation of the Default Similarity Function in Lucene

Hui Fang,Chengxiang Zhai
Abstract:Lucene [4, 3] is a popular open-source IR toolkit, which has been widely used in many searchrelated applications [5]. However, there was no study on evaluating the retrieval performance of the default retrieval function that is implemented in Lucene. Clearly, an improved retrieval function would enable all the applications based on Lucene such as Nutch to achieve higher search accuracy. Thus it would be interesting to perform a quantitative evaluation of the retrieval function implemented in Lucene to see how well it perform relative compared with one of the state of the art retrieval functions. In this report, we evaluate the default retrieval function of Lucene over three representative evaluation collections [6], and compare it with a state-of-the-art retrieval function, i.e., F2-EXP axiomatic retrieval function, which was proposed in [2]. Experiments show that the retrieval performance of the default function is worse than axiomatic retrieval function, suggesting that the axiomatic retrieval function is a good alternative retrieval function that could be implemented in Lucene.
What problem does this paper attempt to address?