Sequential Estimation of Gaussian Process-based Deep State-Space Models

Yuhao Liu,Marzieh Ajirak,Petar Djuric
DOI: https://doi.org/10.1109/TSP.2023.3303648
2024-03-24
Abstract:We consider the problem of sequential estimation of the unknowns of state-space and deep state-space models that include estimation of functions and latent processes of the models. The proposed approach relies on Gaussian and deep Gaussian processes that are implemented via random feature-based Gaussian processes. In these models, we have two sets of unknowns, highly nonlinear unknowns (the values of the latent processes) and conditionally linear unknowns (the constant parameters of the random feature-based Gaussian processes). We present a method based on particle filtering where the parameters of the random feature-based Gaussian processes are integrated out in obtaining the predictive density of the states and do not need particles. We also propose an ensemble version of the method, with each member of the ensemble having its own set of features. With several experiments, we show that the method can track the latent processes up to a scale and rotation.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to construct and estimate deep state - space models (DSSMs) based on Gaussian Processes (GPs), especially when dealing with latent processes in highly nonlinear and non - stationary settings. Specifically, the article aims to solve the following problems: 1. **Sequential Estimation of Unknown Parameters**: Research on how to sequentially estimate unknown functions and latent processes in state - space models and deep state - space models. These models include two sets of unknowns: highly nonlinear unknowns (the values of latent processes) and conditionally linear unknowns (the constant parameters of Gaussian processes based on random features). 2. **Efficient Computation Method**: Propose a particle - filtering - based method, in which the parameters of the random - feature Gaussian process are integrated out when obtaining the predictive density of the state, so that no particles are required for representation. This reduces the computational burden and improves the efficiency of the method. 3. **Application of Ensemble Learning**: Introduce an ensemble learning method to reduce the variance of latent process estimation and observation prediction by having each member possess its own feature set. This method can improve the accuracy and robustness of the estimation. 4. **Extension to Deep Structures**: Extend the traditional state - space model to a deep structure to increase the model capacity and reveal more information about the studied phenomenon. The deep structure allows the use of simple nonlinear activation functions to approximate the unknown highly nonlinear target function. 5. **Dynamic Deep Probabilistic Latent Variable Model**: Construct a dynamic deep probabilistic latent variable model, in which the intermediate - layer variables are conditioned independently of the states from deeper layers, and the dynamics are generated by the process at the deepest layer. ### Formula Summary - **Definition of Gaussian Process**: \[ p(f|X)=\mathcal{N}(f|m_X, K_{XX}) \] where \(m_X = [m(x_j)]_{j = 1}^J\), \(K_{XX}=[\kappa(x_i,x_j)]_{i,j}\). - **Predictive Distribution**: \[ p(f^*|X^*, f, X)=\mathcal{N}(f^*|\mu^*, \Sigma^*) \] where, \[ \mu^* = m_{X^*}+K_{X^*X}K_{XX}^{-1}(f - m_X) \] \[ \Sigma^* = K_{X^*X^*}-K_{X^*X}K_{XX}^{-1}K_{XX^*} \] - **Random Feature Representation**: \[ \phi(x)=\frac{1}{\sqrt{J}}[\sin(x^{\top}\omega_1),\cos(x^{\top}\omega_1),\ldots,\sin(x^{\top}\omega_J),\cos(x^{\top}\omega_J)]^{\top} \] where \(\Omega = \{\omega_j\}_{j = 1}^J\) is a sample randomly drawn from the power spectral density of the Gaussian process kernel. - **Bayesian Linear Regression**: \[ y=\phi^{\top}\theta+\epsilon \] where \(\epsilon\sim\mathcal{N}(0,\sigma^2)\), and the joint prior of \(\theta\) and \(\sigma^2\) is assumed to be a multivariate normal - inverse - gamma distribution. Through these methods, the paper proposes a novel kernel method to identify nonlinear state - space systems without the need to know the function information that controls the latent and observed processes. In addition, the state - space model is extended through ensemble learning and deep structures to increase the model capacity and better capture the dynamic characteristics of complex systems.