Abstract:This paper introduces an approach to predicting the next event in a soccer match, a challenge bearing remarkable similarities to the problem faced by Large Language Models (LLMs). Unlike other methods that severely limit event dynamics in soccer, often abstracting from many variables or relying on a mix of sequential models, our research proposes a novel technique inspired by the methodologies used in LLMs. These models predict a complete chain of variables that compose an event, significantly simplifying the construction of Large Event Models (LEMs) for soccer. Utilizing deep learning on the publicly available WyScout dataset, the proposed approach notably surpasses the performance of previous LEM proposals in critical areas, such as the prediction accuracy of the next event type. This paper highlights the utility of LEMs in various applications, including match prediction and analytics. Moreover, we show that LEMs provide a simulation backbone for users to build many analytics pipelines, an approach opposite to the current specialized single-purpose models. LEMs represent a pivotal advancement in soccer analytics, establishing a foundational framework for multifaceted analytics pipelines through a singular machine-learning model.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to predict the next event in a football match. Specifically, the authors propose a new method to construct Large Event Models (LEMs). Inspired by Large Language Models (LLMs), this method can more accurately predict the sequence of events in a football match. Compared with traditional methods, this method not only simplifies the model architecture but also improves the prediction accuracy, especially in predicting the next event type. In addition, this method can be used in a variety of application scenarios, such as betting, match analysis tools, and simulation and scenario planning.
### Main contributions of the paper
1. **Improved model architecture**:
- By learning from the core methods of LLMs, the football event data is tokenized, enabling a single model to effectively learn the "language of football events".
- The model captures the most relevant aspects of football events through sequential reasoning, with each reasoning corresponding to a token representing a part of the event.
2. **Higher prediction accuracy**:
- Significantly surpasses previous LEMs in the prediction accuracy of key variables, especially in the prediction of event types.
- Verified by experiments, the new model performs well on multiple metrics, including event type, accuracy prediction, and spatial coordinates.
3. **Wide application potential**:
- Proposes the application potential of LEMs in multiple fields such as betting, match analysis tools, and simulation and scenario planning.
- By generating complex event sequences, LEMs can provide valuable insights to help users make more informed decisions.
### Technical details
- **Dataset**: Use the publicly available Wyscout dataset, which contains match data from the top five European leagues (England, Spain, Germany, Italy, and France).
- **Feature extraction**: Extract 11 features for each event, including event type, whether it is a goal, whether it is accurate, whether it is the home team, half - time, minutes, seconds, X - coordinate, Y - coordinate, home team score, and away team score.
- **Model architecture**: Adopt a multi - layer perceptron (MLP) architecture with 3 hidden layers, each having 512 neurons. The lightweight version of the model has 2 hidden layers, each with 256 neurons.
- **Training process**: Use the Binary Cross - Entropy Loss (BCELoss) and Adam optimizer and train for 50 epochs.
### Experimental results
- **Performance comparison**: Compared with the baseline model and previous LEMs, the new model shows significant improvement on multiple metrics, especially in event type prediction accuracy and spatial coordinate prediction error.
- **Model inspection**: The accuracy of the model in predicting the next event position is verified by analyzing the transition matrix.
- **Expected goals map**: Shows the expected goals maps of different models in specific match states, revealing the performance differences of the models in different contexts.
### Conclusion
This paper significantly improves event prediction in football matches by introducing a new method inspired by LLMs. The new method not only improves prediction accuracy but also simplifies the model architecture, providing new tools and methods for football data analysis.