On the Limitations of Elo: Real-World Games, are Transitive, not Additive

Quentin Bertrand,Wojciech Marian Czarnecki,Gauthier Gidel
DOI: https://doi.org/10.48550/arXiv.2206.12301
2023-03-07
Abstract:Real-world competitive games, such as chess, go, or StarCraft II, rely on Elo models to measure the strength of their players. Since these games are not fully transitive, using Elo implicitly assumes they have a strong transitive component that can correctly be identified and extracted. In this study, we investigate the challenge of identifying the strength of the transitive component in games. First, we show that Elo models can fail to extract this transitive component, even in elementary transitive games. Then, based on this observation, we propose an extension of the Elo score: we end up with a disc ranking system that assigns each player two scores, which we refer to as skill and consistency. Finally, we propose an empirical validation on payoff matrices coming from real-world games played by bots and humans.
Computer Science and Game Theory,Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the limitations of the currently widely - used Elo rating system in evaluating players' skills or strength. Specifically, the Elo rating system assumes that the game has a strong additive (i.e., transitive) component, which does not hold in some cases, especially in games with strong non - transitivity, such as "StarCraft II". Therefore, the goals of the paper are as follows: 1. **Identify the transitive components in the game**: Researchers explored how to quantify the degree of transitivity in zero - sum two - player games from actual data. This is because cyclic behaviors have been observed in many real - world games (such as "StarCraft II"), and the traditional Elo rating system cannot handle these cyclic behaviors well. 2. **Propose a new ranking system**: Based on the understanding of the limitations of the Elo rating system, the paper proposes a new ranking system that can more accurately extract the transitive components in the game. This new system is achieved by decomposing the empirical game payoff matrix, especially by using the concept of the disc game, which is a game model that can be completely transitive or completely cyclic. 3. **Improve prediction performance**: Through empirical analysis, researchers have shown that the newly proposed ranking system has a significant performance improvement in predicting new match - up results compared to the traditional Elo rating system. ### Main contributions of the paper - **Disc decomposition**: The paper proposes a disc decomposition theorem (Theorem 2), which provides a quantitative and operable definition for measuring the degree of transitivity in real - world games. - **Empirical analysis**: Researchers calculated the transitivity amounts of several real - world games (including chess and "StarCraft II") and showed improvements in predicting new match - up results. - **Algorithm implementation**: The paper also provides detailed optimization details, including how to handle missing and infinite entries, and how to update the ranking system online. In short, this paper aims to overcome the limitations of the Elo rating system by proposing a new ranking system, especially when dealing with games with strong non - transitivity, thereby providing more accurate player rankings and match - up result predictions.