Abstract:Various probabilistic time series forecasting models have sprung up and shown remarkably good performance. However, the choice of model highly relies on the characteristics of the input time series and the fixed distribution that the model is based on. Due to the fact that the probability distributions cannot be averaged over different models straightforwardly, the current time series model ensemble methods cannot be directly applied to improve the robustness and accuracy of forecasting. To address this issue, we propose pTSE, a multi-model distribution ensemble method for probabilistic forecasting based on Hidden Markov Model (HMM). pTSE only takes off-the-shelf outputs from member models without requiring further information about each model. Besides, we provide a complete theoretical analysis of pTSE to prove that the empirical distribution of time series subject to an HMM will converge to the stationary distribution almost surely. Experiments on benchmarks show the superiority of pTSE overall member models and competitive ensemble methods.
What problem does this paper attempt to address?
### Problems the paper attempts to solve
The paper aims to solve the problem of model integration in probabilistic time - series prediction. Specifically, the existing time - series model integration methods cannot be directly applied to improve the robustness and accuracy of prediction, because the probability distributions of different models cannot be directly averaged. To overcome this challenge, the authors propose pTSE (probabilistic Time Series Ensemble), a multi - model distribution integration method based on Hidden Markov Model (HMM).
### Main contributions
1. **Propose pTSE**: pTSE is a multi - model integration method for probabilistic time - series prediction. It only requires the output of member models and no further information.
2. **Theoretical verification**: The authors theoretically prove the integrated distribution discovered by pTSE, that is, the distribution that the time series approximately follows in any time period.
3. **Empirical results**: Through experiments on real - world datasets, it is proved that the performance of pTSE is better than that of a single model and other integration methods for point - estimate models.
### Method overview
#### 2.1 Preliminaries
- **HMM**: Hidden Markov Model (HMM) is a model that describes the joint probability of a set of random variables. The observed variable \(O_t\) can be continuous or discrete, and the hidden state \(S_t\) corresponds to each \(O_t\). HMM satisfies the following conditions:
\[
p(S_{t + 1}|S_1,\ldots,S_t)=p(S_{t + 1}|S_t)
\]
\[
p(O_t|S_1,\ldots,S_T,O_1,\ldots,O_T)=p(O_t|S_t)
\]
- **HMM fitting**: Fitting HMM requires the determination of the following parameters:
- Transition matrix \(A=(a_{i,j})_{1\leq i,j\leq K}\), where \(a_{ij}=p(S_{t + 1}=j|S_t = i)\)
- Set of emission function parameters \(\Theta=\{\theta_k\}_{k = 1}^K\)
- Initial distribution \(\pi=(\pi_1,\ldots,\pi_K)\), where \(\pi_k=p(S_0 = k)\)
The fitting process usually uses maximum likelihood estimation (MLE) or an equivalent form:
\[
\arg\max_{A,\pi,\Theta}p(\{O_t\}_{t = 1}^T|A,\pi,f_k(O_t;\theta_k\in\Theta))
\]
#### 2.2 Framework foundation
- **Problem definition**: Probabilistic prediction problems usually need to estimate the conditional distribution \(p(y_t|M(X_t))\) given the training model \(M\) and the feature vector \(X_t\).
- **Model assumption**: Suppose there are \(K\) probabilistic prediction models \(\{M_k\}_{k = 1}^K\) independently fitted to the same dataset \(\{y_t\}_{t = 1}^T\). At each time point \(t\), there is an optimal model \(M_{k_t}\) such that the distribution of \(y_t\) is determined by \(M_{k_t}(X_t)\), that is, \(y_t\sim p(y_t|M_{k_t}(X_t))\).
- **Model transfer**: For \(y_{t + 1}\), assume that \(M_{k_t}\) will randomly transfer to a new optimal model \(M_{k_{t+1}}\) with probability \(p_{k_t,k_{t+1}}\), which is a Markov process.
#### 2.3 Mixed quantile estimation
- **Problem definition**: For probabilistic prediction methods, usually the PDF \(f_{X_t}^{M_k}(y_t)\) is not directly evaluated, but instead the...