Abstract:State inference and parameter learning in sequential models can be successfully performed with approximation techniques that maximize the evidence lower bound to the marginal log-likelihood of the data distribution. These methods may be referred to as Dynamical Variational Autoencoders, and our specific focus lies on the deep Kalman filter. It has been shown that the ELBO objective can oversimplify data representations, potentially compromising estimation quality. Tighter Monte Carlo objectives have been proposed in the literature to enhance generative modeling performance. For instance, the IWAE objective uses importance weights to reduce the variance of marginal log-likelihood estimates. In this paper, importance sampling is applied to the DKF framework for learning deep Markov models, resulting in the IW-DKF, which shows an improvement in terms of log-likelihood estimates and KL divergence between the variational distribution and the transition model. The framework using the sampled DKF update rule is also accommodated to address sequential state and parameter estimation when working with highly non-linear physics-based models. An experiment with the 3-space Lorenz attractor shows an enhanced generative modeling performance and also a decrease in RMSE when estimating the model parameters and latent states, indicating that tighter MCOs lead to improved state inference performance.
What problem does this paper attempt to address?
This paper aims to solve the problem that using the standard variational auto - encoder (VAE) objective function (i.e., evidence lower bound, ELBO) in deep sequential state estimation may lead to an overly simplified data representation. Specifically, the paper focuses on the deep Kalman filter (DKF) framework and introduces tighter Monte Carlo objectives (MCOs), especially the objective function of the importance - weighted auto - encoder (IWAE), to improve the generative modeling performance and the quality of state estimation.
### Main research problems
1. **Simplified data representation**: The standard ELBO objective function may cause the model's representation of data to be overly simplified, which may affect the quality of state estimation.
2. **Improvement of generative modeling performance**: Improve the performance of the generative model by introducing tighter MCOs, such as the IWAE objective function.
3. **State and parameter estimation of nonlinear physical models**: Evaluate the influence of the IWAE objective function on state and parameter estimation when dealing with highly nonlinear physical models.
### Solutions
The paper proposes the importance - weighted deep Kalman filter (IW - DKF). This method improves the state - estimation performance by applying sampling techniques in the DKF framework and using K - sample importance - weighted estimation of the marginal log - likelihood. Specific improvements include:
- **Generative modeling performance**: Experimental results show that using IW - DKF can improve the performance of generative modeling, especially in the case of Gaussian mixture models (DMMs) and the three - dimensional Lorenz attractor model.
- **State and parameter estimation**: On the three - dimensional Lorenz attractor model, IW - DKF shows better performance in parameter estimation and state estimation, especially in reducing the root - mean - square error (RMSE).
### Experimental verification
1. **DMM learning on the polyphonic music dataset**:
- **Settings**: Use the polyphonic music dataset, where the training, validation, and test sets contain 220, 76, and 77 sequences respectively.
- **Results**: As the number of samples K increases, the log - likelihood estimate of IW - DKF gradually increases, and the standard deviation significantly decreases, indicating that the stability of the model is enhanced.
2. **State estimation of the three - dimensional Lorenz attractor model**:
- **Settings**: Use the three - dimensional Lorenz attractor model, which is a nonlinear chaotic system.
- **Results**: IW - DKF shows better performance in parameter estimation and state estimation, especially with significant improvements in the error of parameter estimation and the RMSE of state estimation.
### Conclusions
By introducing IW - DKF, the paper successfully solves the problem of simplified data representation in deep sequential state estimation and achieves significant performance improvements in generative modeling and state estimation. Future research directions include further comparing the performance of different MCOs in state estimation and methods for directly optimizing the variational distribution.