Sports Prediction and Betting Models in the Machine Learning Age: The Case of Tennis

Sascha Wilkens
DOI: https://doi.org/10.2139/ssrn.3506302
2019-01-01
SSRN Electronic Journal
Abstract:Machine learning and its numerous variants have meanwhile become established tools in many areas, from financial services, over autonomous robots to medical research. Several attempts have been made to apply machine learning to the prediction of the outcome of professional sports events and to exploit “inefficiencies” in the corresponding betting markets. One of the main focus has so far been on the soccer market, with tennis – as one of the other major sports and betting marketplaces – receiving less attention. This paper takes one of the most extensive datasets, covering ten years of male and female professional singles matches, and analyzes two fundamental questions. First, can a variety of machine learning techniques (e.g., random forests) outperform more simple techniques such as logistic regression with regard to predicting the outcome of matches? In this context, what is the informational content of betting market odds and historical match and player data? Second, can the various modeling techniques be used to provide consistent positive returns for bettors? Across all analyzed models, the odds from bookmakers are found to encompass most of the available information to predict the outcomes of matches. Returns from betting strategies over the longer term based on multiple prediction models and using various money management strategies are mainly negative unless one assumes access to the most favorable market quotes. The analysis thus casts certain doubt on those studies that report an achievable "edge" for bettors.
What problem does this paper attempt to address?