Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control?

Yi Tian,Kaiqing Zhang,Russ Tedrake,Suvrit Sra
2024-03-14
Abstract:We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning (e.g., costs) without reconstructing the observations. In particular, we focus on an intuitive cost-driven state representation learning method for solving Linear Quadratic Gaussian (LQG) control, one of the most fundamental partially observable control problems. As our main results, we establish finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model. To the best of our knowledge, despite various empirical successes, prior to this work it was unclear if such a cost-driven latent model learner enjoys finite-sample guarantees. Our work underscores the value of predicting multi-step costs, an idea that is key to our theory, and notably also an idea that is known to be empirically valuable for learning state representations.
Machine Learning,Systems and Control,Optimization and Control
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **Can direct latent model learning solve the linear - quadratic - Gaussian (LQG) control problem?** Specifically, the paper explores how to control an unknown partially observable system by learning state representations from potentially high - dimensional observations. The authors adopt a direct latent model learning method, that is, learning a dynamic model by predicting quantities directly related to planning (such as cost) without reconstructing the observations. ### Problem Background 1. **Control problems of partially observable systems**: - In a partially observable system, the true state of the system cannot be directly observed and can only be inferred from observations and control inputs. - The LQG control problem is one of the classic, partially observable control problems and has important theoretical and practical significance. 2. **Limitations of existing methods**: - **Reconstruction - based methods**: Many existing methods rely on reconstructing observations to learn state representations, but this may lead to high - dimensional and noisy data processing problems, and the reconstructed observations may contain information irrelevant to control. - **Model - free methods**: Model - free methods directly learn policies, but they usually require a large number of samples and have poor generalization ability in complex tasks. ### Core contributions of the paper - **Direct cost - driven state representation learning**: This paper proposes a new method to learn state representations by predicting multi - step cumulative costs, rather than by reconstructing observations or inverse models. This method is more directly related to the control objective. - **Finite - sample guarantee**: The authors prove that in the case of a finite number of samples, a near - optimal state representation function and controller can be found. This is the first time to provide a theoretical finite - sample guarantee for this cost - driven latent model learning method. ### Specific problem description The paper focuses on the following problem: \[ \text{Can direct cost - driven state representation learning effectively solve the LQG control problem?} \] To this end, the authors study a partially observable linear time - varying (LTV) dynamic system: \[ x_{t + 1}=A_t^* x_t + B_t^* u_t+w_t, \quad y_t = C_t^* x_t + v_t, \] where \( x_t \) is the state, \( y_t \) is the observation, \( u_t \) is the control input, \( w_t \) and \( v_t \) are process noise and observation noise respectively. The goal is to minimize the cumulative cost by learning state representations given observations and control inputs: \[ c_t(x, u)=\|x\|^2_{Q_t^*}+\|u\|^2_{R_t^*}, \] and finally find an optimal control strategy. ### Conclusion Through strict theoretical analysis and experimental verification, the paper proves that the direct cost - driven state representation learning method can effectively solve the LQG control problem with a finite number of samples. This result provides an important theoretical basis and technical support for future research and applications.