Abstract:Nonlinear optimal control is vital for numerous applications but remains challenging for unknown systems due to the difficulties in accurately modelling dynamics and handling computational demands, particularly in high-dimensional settings. This work develops a theoretically certifiable framework that integrates a modified Koopman operator approach with model-based reinforcement learning to address these challenges. By relaxing the requirements on observable functions, our method incorporates nonlinear terms involving both states and control inputs, significantly enhancing system identification accuracy. Moreover, by leveraging the power of neural networks to solve partial differential equations (PDEs), our approach is able to achieving stabilizing control for high-dimensional dynamical systems, up to 9-dimensional. The learned value function and control laws are proven to converge to those of the true system at each iteration. Additionally, the accumulated cost of the learned control closely approximates that of the true system, with errors ranging from $10^{-5}$ to $10^{-3}$.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the nonlinear optimal control problem, especially the control of unknown nonlinear dynamic systems. Specifically, the paper aims to develop a theoretically guaranteed framework that combines the improved Koopman operator method with model - based reinforcement learning to address the challenges in high - dimensional systems. The following are the key problems that the paper attempts to solve:
1. **Accurately model dynamic systems**: For unknown nonlinear systems, accurately modeling their dynamic characteristics is extremely challenging. Traditional linearization methods and methods of directly solving the HJB equation are not effective in high - dimensional cases and have high computational complexity.
2. **Handle computational requirements**: The computational requirements in high - dimensional systems are huge, especially in cases where real - time control is required. Existing methods are often difficult to meet the requirements of computational efficiency while maintaining accuracy.
3. **Improve system identification accuracy**: By relaxing the requirements for observable functions, this method can include nonlinear terms involving state and control inputs, thereby significantly improving the accuracy of system identification.
4. **Achieve stable control**: Using neural networks to solve partial differential equations (PDEs), this method can achieve stable control in dynamic systems up to 9 - dimensional, and prove that the learned value function and control law converge to their counterparts in the real system at each iteration.
5. **Approximation of cumulative cost**: This method makes the cumulative cost of the control strategy learned from data very close to that of the real system, with an error range between \(10^{-5}\) and \(10^{-3}\).
### Formula Summary
- **Dynamic system**:
\[
\dot{x} = f(x)+g(x)u
\]
where \(x\in\mathbb{R}^n\) is the state vector, \(u\in\mathbb{R}^m\) is the input, \(f:\mathbb{R}^n\rightarrow\mathbb{R}^n\) is a continuously differentiable vector field, and \(g:\mathbb{R}^n\rightarrow\mathbb{R}^{n\times m}\) is a smooth function.
- **Value function and optimal control**:
\[
J(x, u)=\int_{0}^{\infty}L(S_t(x, u), u(t))dt
\]
\[
u^{*}=\kappa^{*}(x)\quad\text{such that}\quad J(x, u^{*})=\inf_{\kappa\in U}J(x, \kappa(x))
\]
\[
V(x): = J(x, u^{*})
\]
- **Optimal control derivation**:
\[
\kappa^{*}(x):=\arg\min_{u\in U}\left\{L(x, u)+D V(x)\cdot f(x, u)\right\}=-\frac{1}{2}R^{-1}g^{T}(x)(D V(x))^{T}
\]
### Conclusion
This paper proposes a novel method to solve the optimal control problem in high - dimensional nonlinear systems, demonstrating the effectiveness of this method in achieving stable control and accurately approximating the value function. Although the considered area of high - dimensional systems in the current research is relatively limited, this method provides a solid foundation for solving control problems in higher - dimensional systems in the future.