Biases in Expected Goals Models Confound Finishing Ability

Jesse Davis,Pieter Robberechts
2024-01-18
Abstract:Expected Goals (xG) has emerged as a popular tool for evaluating finishing skill in soccer analytics. It involves comparing a player's cumulative xG with their actual goal output, where consistent overperformance indicates strong finishing ability. However, the assessment of finishing skill in soccer using xG remains contentious due to players' difficulty in consistently outperforming their cumulative xG. In this paper, we aim to address the limitations and nuances surrounding the evaluation of finishing skill using xG statistics. Specifically, we explore three hypotheses: (1) the deviation between actual and expected goals is an inadequate metric due to the high variance of shot outcomes and limited sample sizes, (2) the inclusion of all shots in cumulative xG calculation may be inappropriate, and (3) xG models contain biases arising from interdependencies in the data that affect skill measurement. We found that sustained overperformance of cumulative xG requires both high shot volumes and exceptional finishing, including all shot types can obscure the finishing ability of proficient strikers, and that there is a persistent bias that makes the actual and expected goals closer for excellent finishers than it really is. Overall, our analysis indicates that we need more nuanced quantitative approaches for investigating a player's finishing ability, which we achieved using a technique from AI fairness to learn an xG model that is calibrated for multiple subgroups of players. As a concrete use case, we show that (1) the standard biased xG model underestimates Messi's GAX by 17% and (2) Messi's GAX is 27% higher than the typical elite high-shot-volume attacker, indicating that Messi is even a more exceptional finisher than people commonly believed.
Machine Learning,Applications
What problem does this paper attempt to address?
This paper mainly discusses the problems in evaluating players' shooting skills using expected goals (xG) models in football data analysis. The researchers proposed three hypotheses: 1. Due to the limited sample size and high variability of shooting outcomes, the deviation between actual goals and expected goals is not an effective measure of shooting skills. 2. Including all types of shots when calculating cumulative xG may not be appropriate, as shooting skills have multiple aspects that need to be analyzed separately (such as headers and long shots). 3. The xG model is biased because the interdependence of data affects skill measurements, leading to an underestimation of the difference between actual and expected goals for excellent shooters. Through simulation experiments and real data analysis, the paper found that: - Players need both a high volume of shots and excellent shooting technique to consistently exceed cumulative xG. - Including all types of shots can mask the shooting ability of excellent forwards. - The xG model underestimates the expected goals of excellent shooters, indicating systematic bias. Drawing inspiration from the fairness principles of artificial intelligence, the paper proposes using a multi-calibration approach to learn xG models calibrated for different subgroups of players. Taking Messi as an example, the paper shows that a standard biased xG model underestimated Messi's goals beyond expected value by 17%, while Messi's goals beyond expected value were 27% higher than typical high-volume attacking players, emphasizing Messi's shooting skills are more outstanding than commonly believed. In summary, the paper aims to address the limitations of xG models in evaluating football players' shooting skills and calls for the adoption of more refined quantitative methods to accurately analyze players' shooting abilities.