Geospatial Road Cycling Race Results Data Set

Bram Janssens,Luca Pappalardo,Jelle De Bock,Matthias Bogaert,Steven Verstockt
2024-09-26
Abstract:The field of cycling analytics has only recently started to develop due to limited access to open data sources. Accordingly, research and data sources are very divergent, with large differences in information used across studies. To improve this, and facilitate further research in the field, we propose the publication of a data set which links thousands of professional race results from the period 2017-2023 to detailed geographic information about the courses, an essential aspect in road cycling analytics. Initial use cases are proposed, showcasing the usefulness in linking these two data sources.
Computers and Society,Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problems of data dispersion and lack of standardization in the field of professional road cycling race data analysis. Specifically, the authors proposed and released a dataset that combines the results of thousands of professional races from 2017 to 2023 with detailed geographical information. The following are the main objectives of this study: 1. **Data dispersion and inconsistency**: - The data sources used in current cycling analysis research are very scattered, and there are large differences in information between different studies. This makes cross - study comparison and integration difficult. - The paper points out that due to personal data protection and privacy issues (such as GDPR), obtaining high - quality, standardized data has become more complicated. 2. **Lack of consideration for track information**: - The results and tactical choices in road cycling races largely depend on track characteristics (such as terrain, road surface type, etc.). However, most previous studies have ignored these important track information. - Different data sources have different naming methods for the same race, making it difficult to merge these data sources on a large scale. 3. **Promote data - driven research and innovation**: - By providing a standardized dataset, the authors hope to lower the entry threshold in this field, enabling researchers to conduct comparative studies more easily and develop new analysis methods. - The dataset includes not only race results but also detailed track information, which helps to evaluate athletes' performance more comprehensively, identify potential talents, and develop new applications. 4. **Address data matching and anonymization challenges**: - Researchers link GPX files with race result data through fuzzy matching techniques to ensure data consistency and accuracy. - At the same time, considering privacy issues, the data has been appropriately anonymized to prevent unauthorized disclosure of personal information. In conclusion, the core problem of this paper is to solve the problems of data dispersion, lack of standardization, and neglect of track information in current cycling data analysis, and to promote more in - depth and extensive research by providing a comprehensive and standardized dataset.