Abstract:Confronted with the challenge of identifying the most suitable metric to validate the merits of newly proposed models, the decision-making process is anything but straightforward. Given that comparing rankings introduces its own set of formidable challenges and the likely absence of a universal metric applicable to all scenarios, the scenario does not get any better. Furthermore, metrics designed for specific contexts, such as for Recommender Systems, sometimes extend to other domains without a comprehensive grasp of their underlying mechanisms, resulting in unforeseen outcomes and potential misuses. Complicating matters further, distinct metrics may emphasize different aspects of rankings, frequently leading to seemingly contradictory comparisons of model results and hindering the trustworthiness of evaluations. We unveil these aspects in the domain of ranking evaluation metrics. Firstly, we show instances resulting in inconsistent evaluations, sources of potential mistrust in commonly used metrics; by quantifying the frequency of such disagreements, we prove that these are common in rankings. Afterward, we conceptualize rankings using the mathematical formalism of symmetric groups detaching from possible domains where the metrics have been created; through this approach, we can rigorously and formally establish essential mathematical properties for ranking evaluation metrics, essential for a deeper comprehension of the source of inconsistent evaluations. We conclude with a discussion, connecting our theoretical analysis to the practical applications, highlighting which properties are important in each domain where rankings are commonly evaluated. In conclusion, our analysis sheds light on ranking evaluation metrics, highlighting that inconsistent evaluations should not be seen as a source of mistrust but as the need to carefully choose how to evaluate our models in the future.

Wikiometrics: A Wikipedia Based Ranking System

A Comparative Study Of Academic And Wikipedia Ranking

Mining the Rank of Universities with Wikipedia.

World influence and interactions of universities from Wikipedia networks

Longitudinal Assessment of Reference Quality on Wikipedia

Quantifying Engagement with Citations on Wikipedia

Quantitative Analysis of the Top Ten Wikipedias

Academic Ranking with Web Mining and Axiomatic Analysis

Refining Hierarchies Of Public Knowledge Spheres By Mutual Awareness Of Keywords - Towards A More Versatile Wikipedia

Highlighting Entanglement of Cultures via Ranking of Multilingual Wikipedia Articles

Measuring University Impact: Wikipedia approach

Ranking evaluation metrics from a group-theoretic perspective

Interactions of Cultures and Top People of Wikipedia from Ranking of 24 Language Editions

"We Need a Woman in Music": Exploring Wikipedia's Values on Article Priority

Analysis of the Wikipedia Network of Mathematicians

Capturing the influence of geopolitical ties from Wikipedia with reduced Google matrix

Wiki-index of authors popularity

Wikibench: Community-Driven Data Curation for AI Evaluation on Wikipedia

The distorted mirror of Wikipedia: a quantitative analysis of Wikipedia coverage of academics

From academic to media capital: To what extent does the scientific reputation of universities translate into Wikipedia attention?

MediaRank: Computational Ranking of Online News Sources