XXLTraffic: Expanding and Extremely Long Traffic Dataset for Ultra-Dynamic Forecasting Challenges

Du Yin,Hao Xue,Arian Prabowo,Shuang Ao,Flora Salim
2024-06-18
Abstract:Traffic forecasting is crucial for smart cities and intelligent transportation initiatives, where deep learning has made significant progress in modeling complex spatio-temporal patterns in recent years. However, current public datasets have limitations in reflecting the ultra-dynamic nature of real-world scenarios, characterized by continuously evolving infrastructures, varying temporal distributions, and temporal gaps due to sensor downtimes or changes in traffic patterns. These limitations inevitably restrict the practical applicability of existing traffic forecasting datasets. To bridge this gap, we present XXLTraffic, the largest available public traffic dataset with the longest timespan and increasing number of sensor nodes over the multiple years observed in the data, curated to support research in ultra-dynamic forecasting. Our benchmark includes both typical time-series forecasting settings with hourly and daily aggregated data and novel configurations that introduce gaps and down-sample the training size to better simulate practical constraints. We anticipate the new XXLTraffic will provide a fresh perspective for the time-series and traffic forecasting communities. It would also offer a robust platform for developing and evaluating models designed to tackle ultra-dynamic and extremely long forecasting problems. Our dataset supplements existing spatio-temporal data resources and leads to new research directions in this domain.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper primarily addresses the following issues: 1. **Limitations of Existing Datasets**: - Existing public traffic datasets have limitations in reflecting the ultra-dynamic characteristics of real-world scenarios. These limitations include continuously evolving infrastructure, changes in temporal distribution, and time intervals caused by sensor downtime or changes in traffic patterns. - These limitations inevitably restrict the practical application of existing traffic prediction datasets. 2. **Need for Ultra-Long Time Span Datasets**: - A new dataset named XXLTraffic is proposed, which is currently the largest public traffic dataset with the longest time span and an increasing number of sensor nodes year by year. This dataset aims to support ultra-dynamic prediction research. - The dataset includes typical time series prediction settings (hourly and daily aggregated data) and introduces time intervals to better simulate real-world constraints. 3. **Challenges of Ultra-Dynamic Prediction**: - The ultra-dynamic challenge includes three key aspects: the continuously evolving state of infrastructure over time, changes in temporal distribution over extremely long observation periods, and spatiotemporal dynamics with time intervals. - These factors require models to adapt to changes in patterns and trends over long time spans and handle inconsistencies and interruptions in the data. By proposing the XXLTraffic dataset, the paper aims to provide new perspectives for the time series and traffic prediction community and offer a robust platform for developing and evaluating models that address ultra-dynamic and ultra-long time prediction problems. Additionally, this dataset complements existing spatiotemporal data resources and guides new research directions in the field.