Abstract:Recently, Moffat et al. proposed an analytic framework, namely C/W/L/A, for offline evaluation metrics. This framework allows information retrieval (IR) researchers to design evaluation metrics through the flexible combination of user browsing models and user gain aggregations. However, the statistical stability of C/W/L/A metrics with different aggregations is not yet investigated. In this study, we investigate the statistical stability of C/W/L/A metrics from the perspective of: (1) the system ranking similarity among aggregations, (2) the system ranking consistency of aggregations and (3) the discriminative power of aggregations. More specifically, we combined various aggregation functions with the browsing model of Precision, Discounted Cumulative Gain (DCG), Rank-Biased Precision (RBP), INST, Average Precision (AP) and Expected Reciprocal Rank (ERR), examing their performances in terms of system ranking similarity, system ranking consistency and discriminative power on two offline test collections. Our experimental result suggests that, in terms of system ranking consistency and discriminative power, the aggregation function of expected rate of gain (ERG) has an outstanding performance while the aggregation function of maximum relevance usually has an insufficient performance. The result also suggests that Precision, DCG, RBP, INST and AP with their canonical aggregation all have favourable performances in system ranking consistency and discriminative power; but for ERR, replacing its canonical aggregation with ERG can further strengthen the discriminative power while obtaining a system ranking list similar to the canonical version at the same time.

A Comparative Study on the Combination of Multiple Retrieval Systems

Comparing System Selection Methods for the Combinatorial Fusion of Multiple Retrieval Systems

Combination of Multiple Retrieval Systems Using Rank-Score Function and Cognitive Diversity

Combining Multiple Retrieval Systems Using Combinatorial Fusion Analysis and Rank-Score Characteristic Function

Improved Combination of Multiple Retrieval Systems Using a Dynamic Combinatorial Fusion Algorithm.

Combining similarity measures in content-based image retrieval guided by mutual information

TagCombine: Recommending Tags to Contents in Software Information Sites

Sequential Combination Methods for Data Clustering Analysis

Fusion of effective retrieval strategies in the same information retrieval system

Efficient information retrieval based on a combination of vector space and probabilistic models

Combining Strategies For Xml Retrieval

Combining multiple sources for short query translation in Chinese-English cross-language information retrieval

A Comparison Between Term-Based and Embedding-Based Methods for Initial Retrieval

Analysis of Methods for Novel Case Selection

Multiple testing in statistical analysis of systems-based information retrieval experiments

Resources and Evaluations for Multi-Distribution Dense Information Retrieval

Revisiting The Evaluation Of Diversified Search Evaluation Metrics With User Preferences

A Meta-Evaluation of C/W/L/A Metrics: System Ranking Similarity, System Ranking Consistency and Discriminative Power

A comparative study on ranking and selection strategies for multi-document summarization

Combining Belief Functions Based on Three Mechanisms: Average, Multiplication and Intersection

Tri-space and Ranking Based Heterogeneous Similarity Measure for Cross-Media Retrieval.