Statistical Models for Match Prediction and Decision Making in Sport

Xiaojun Shi
2008-01-01
Abstract:In this study, we investigate models for the prediction of match outcome. These models are then used to aid decision-making. In particular, we consider batting strategy in test cricket. This model provides decision support for a team that is aiming to set a target at declaration. We also develop a measure of the importance of a match in a tournament. Such a measure may be of use in tournament design. Decision-making on the timing of a declaration in test cricket is considered using match outcome probabilities given the state of a game. Logistic regression is used to model the effect of covariates, target set and overs remaining, on match outcome probabilities. This approach is then extended to establish batting strategy by considering run rate and the distribution of runs scored during a partnership. A decision tool for batting strategy towards a target aimed for is established. The importance of a particular match in a tournament is measured given the outcomes of all other matches. This method is illustrated for the English Premiership. Match importance is calculated with respect to winning the Championship, relegation from Premiership, qualifying for the UEFA Champions League and prize money. Match outcome probabilities for the match of interest are estimated using an ordinal logistic regression model. Covariates that represent the short and long term performance of the competing teams are used in this prediction model. This thesis makes the following contributions regarding the application of statistical methods in sport. A new quantitative approach that considers the optimum declaration time in test cricket is developed. We consider this modelling of fundamental playing strategy to be novel. We find that a zero-inflated negative binomial distribution is a good model for the distribution of runs scored in test cricket. The match importance measure that we describe extends an existing definition. The match outcome model we use for calculating match importance considers novel covariates related to the recent results of teams.
What problem does this paper attempt to address?