What problem does this paper attempt to address?

The problem that this paper attempts to solve is **latent state estimation and latent linear dynamics discovery based on image observations**. Specifically, the author aims to infer the latent states of the system from image observations and simultaneously discover the linear dynamic relationships between these latent states. ### Problem Background In many fields of natural science research (such as physics, biology, etc.), it is often necessary to estimate the states of dynamic systems that cannot be directly observed. To solve such problems, filtering algorithms are usually used to process noisy observation data. However, when the observation data is high - dimensional (such as images), traditional filtering algorithms may no longer be applicable because they usually assume that the observation model is low - dimensional and has a clear mathematical form. ### Main Contributions of the Paper 1. **Latent State Estimation**: The paper proposes a new method to estimate the latent states of the system from image observations. 2. **Latent Linear Dynamics Discovery**: At the same time, the paper also explores how to discover the linear dynamic relationships between latent states from image observations. ### Specific Methods To achieve the above goals, the paper introduces several key techniques: - **Autoencoders (AE)**: Used to map high - dimensional image observations to a low - dimensional latent space. - **Conditional Normalizing Flows (CNF)**: Used to model the observation likelihood function \(p(y_t|x_t)\) and allow the calculation of exact probability densities. - **Particle Filter (PF)**: Combined with CNF, used to estimate the posterior distribution of the latent states. ### Mathematical Model The system dynamics and observation model can be represented as: \[ x_t = A_{t - 1}x_{t - 1}+B_{t - 1}u_{t - 1}+q_{t - 1} \] \[ y_t = g(x_t) \] where \(x_t\) is the latent state, \(y_t\) is the observation value, \(u_t\) is the control input, \(q_t\sim N(0, Q_t)\) is zero - mean Gaussian noise, and \(g(\cdot)\) is a nonlinear observation model. By using conditional normalizing flows, the observation likelihood function can be represented as: \[ p(y_t|x_t)=p_u(u_t)\left|\det J_T(u_t)\right|^{-1} \] where \(u_t = T^{-1}(y_t)\), \(p_u(u_t)\) is the base distribution (such as a multivariate Gaussian distribution), and \(J_T(u_t)\) is the Jacobian matrix of the transformation \(T\). ### Conclusion The method proposed in the paper can estimate latent states from image observations and discover the linear dynamic relationships between these states. Although the experimental results show that this method has certain potential, it still needs further improvement and verification, especially in terms of performance on large - scale data sets and comparison with other methods. Through this method, researchers can use image observation data to infer the latent states of the system and their dynamic characteristics without explicit dynamic and observation models. This has important application prospects in fields such as robot vision and autonomous driving.

Simultaneous Latent State Estimation and Latent Linear Dynamics Discovery from Image Observations