Uncertainty Quantification via Spatial-Temporal Tweedie Model for Zero-inflated and Long-tail Travel Demand Prediction

Xinke Jiang,Dingyi Zhuang,Xianghui Zhang,Hao Chen,Jiayuan Luo,Xiaowei Gao
DOI: https://doi.org/10.1145/3583780.3615215
2024-01-31
Abstract:Understanding Origin-Destination (O-D) travel demand is crucial for transportation management. However, traditional spatial-temporal deep learning models grapple with addressing the sparse and long-tail characteristics in high-resolution O-D matrices and quantifying prediction uncertainty. This dilemma arises from the numerous zeros and over-dispersed demand patterns within these matrices, which challenge the Gaussian assumption inherent to deterministic deep learning models. To address these challenges, we propose a novel approach: the Spatial-Temporal Tweedie Graph Neural Network (STTD). The STTD introduces the Tweedie distribution as a compelling alternative to the traditional 'zero-inflated' model and leverages spatial and temporal embeddings to parameterize travel demand distributions. Our evaluations using real-world datasets highlight STTD's superiority in providing accurate predictions and precise confidence intervals, particularly in high-resolution scenarios.
Machine Learning,Other Statistics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges encountered by existing spatio - temporal deep - learning models when dealing with the sparsity and long - tail characteristics in high - resolution OD (Origin - Destination) matrices, as well as the problem of predictive uncertainty quantification. Specifically: 1. **Sparsity and Long - Tail Characteristics**: There are a large number of zero values and overly dispersed demand patterns in high - resolution OD matrices. These characteristics make it difficult for traditional deterministic deep - learning models based on Gaussian assumptions to model accurately. 2. **Predictive Uncertainty**: Traditional models mainly focus on coarse - grained time resolutions and usually simplify the variance structure, assuming homoscedasticity (i.e., constant variance), which may lead to the neglect of key data features and the inability to fully consider potential biases and real - world uncertainties. To address these problems, the authors propose a new model - the Spatial - Temporal Tweedie Graph Neural Network (STTD). The main improvements of STTD include: - **Introducing the Tweedie Distribution**: It is used to replace the traditional "zero - inflated" model and effectively capture the zero - inflated and long - tail non - zero characteristics in OD travel data. - **Spatio - Temporal Uncertainty Quantification**: By combining spatial and temporal embeddings to parameterize the travel demand distribution, it can better quantify the spatio - temporal uncertainty in sparse travel demand data. ### Formula Representation The probability mass function of the Tweedie distribution is: \[ f_{TD}(x_{it} \mid \theta, \phi) = a(x_{it}, \phi) \exp\left( \frac{x_{it}\theta - \kappa(\theta)}{\phi} \right) \] where: - \(\theta \in \mathbb{R}\) is the natural parameter, - \(\phi \in \mathbb{R}^+\) is the dispersion parameter, - \(a(\cdot)\) and \(\kappa(\cdot)\) are normalization functions corresponding to the parameters \(\phi\) and \(\theta\) respectively. The mean and variance of the Tweedie distribution are: \[ E(x) = \mu = \kappa'(\theta) \] \[ Var(x) = \phi \kappa''(\theta) \] For the case of \(1 < \rho < 2\), the demand \(x_{it}\) can be represented as: \[ x_{it} = \begin{cases} 0 & \text{if no trips} \\ \sum_{j = 1}^{L_{it}} l(j)_{it} & \text{otherwise} \end{cases} \] where \(L_{it}\) follows a Poisson distribution \(Pois(\lambda)\), and \(l(j)_{it}\) is an independent Gamma random variable \(Gamma(\alpha, \gamma)\). ### Conclusion By introducing the Tweedie distribution and the spatio - temporal graph neural network, the STTD model can predict travel demand more accurately and quantify uncertainty, especially performing well in high - resolution scenarios. Experimental results show that STTD outperforms existing methods in multiple spatio - temporal resolutions and performance indicators, especially having significant advantages in dealing with sparse and long - tail data.