A Comparative Study on the Combination of Multiple Retrieval Systems

Chun Yi Liu,Chuan Yi Tang,D. Frank Hsu
DOI: https://doi.org/10.1109/i-span.2012.31
2012-01-01
Abstract:It is known that combining multiple information retrieval systems can improve the combined systems performance over the performance of individual systems in many cases. It has also been known in these cases that the performance improvement of the combined system is mainly due to: (a) performance of each of the individual systems, and (b) the diversity between individual systems. However, it remains a challenging problem to quantify these two conditions. In this paper, we investigate these issues using five TREC datasets, TREC 2-6 (1993-97). Six systems in each dataset are selected either by random choice or by precision. We then compare performance of combining these six systems selected by random v.s. by precision from each of these datasets. It is demonstrated that, in each of the five datasets, the sum of x + y for positive cases (performance of combination of A and B is better than or equal to the individual systems) is larger than for negative cases (other than positive cases), where x is the performance ratio P-l/P-h and y is the diversity (between A and B), both normalized to [ 0, 1]. In addition, it is also demonstrated that combinations of t systems, t = 2, 3, 4, 5, and 6 overall on the 6 systems selected by precision performs better than on the 6 systems selected by random.
What problem does this paper attempt to address?