Abstract:This paper proposes a policy learning algorithm based on the Koopman operator theory and policy gradient approach, which seeks to approximate an unknown dynamical system and search for optimal policy simultaneously, using the observations gathered through interaction with the environment. The proposed algorithm has two innovations: first, it introduces the so-called deep Koopman representation into the policy gradient to achieve a linear approximation of the unknown dynamical system, all with the purpose of improving data efficiency; second, the accumulated errors for long-term tasks induced by approximating system dynamics are avoided by applying Bellman's principle of optimality. Furthermore, a theoretical analysis is provided to prove the asymptotic convergence of the proposed algorithm and characterize the corresponding sampling complexity. These conclusions are also supported by simulations on several challenging benchmark environments.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper proposes a policy learning algorithm based on Koopman operator theory and policy gradient methods (Policy Gradient with Deep Koopman Representation, abbreviated as PGDK). The algorithm aims to approximate unknown dynamical systems using observational data collected through interaction with the environment while simultaneously searching for the optimal policy. #### Main Innovations: 1. **Introduction of Deep Koopman Representation**: Incorporates Deep Koopman Representation (DKR) into policy gradients to achieve a linear approximation of unknown dynamical systems, thereby improving data efficiency. 2. **Avoidance of Cumulative Errors in Long-term Tasks**: Applies Bellman's optimization principle to avoid cumulative errors in long-term tasks caused by approximating system dynamics. #### Theoretical Analysis: - Provides a theoretical analysis of the algorithm's asymptotic convergence and characterizes the corresponding sample complexity. - These conclusions are supported by simulations in several challenging benchmark environments. #### Background and Motivation: - Reinforcement Learning (RL) is a machine learning algorithm focused on training agents to make decisions in an environment by maximizing a reward function. - Traditional Model-Free Reinforcement Learning (MFRL) methods, although not requiring knowledge of system dynamics, usually need a large number of trials to find the optimal policy. - Model-Based Reinforcement Learning (MBRL), while requiring less data, may need more computational resources due to its understanding of environmental dynamics and is more sensitive to model inaccuracies. #### Main Contributions: - Proposes a data-efficient MBRL method that does not require prior knowledge of the environment. - Utilizes Deep Koopman Representation to approximate system dynamics and represent complex nonlinear models in a linear form, enabling more effective control design. - Compared to existing MFRL methods, this approach has higher data efficiency and faster convergence. - Uses the estimated value function based on Bellman's optimization principle to prevent cumulative prediction errors. Through the above methods, this paper aims to address the issue of low data efficiency in existing MFRL methods for complex tasks and provides a new framework for handling unknown dynamical systems.

Policy Learning based on Deep Koopman Representation

Learning Deep Neural Network Representations for Koopman Operators of Nonlinear Dynamical Systems

Deep Koopman Learning using the Noisy Data

On Few Shot Learning of Dynamical Systems: A Koopman Operator Theoretic Approach

Distributed Deep Koopman Learning for Nonlinear Dynamics

DLKoopman: A deep learning software package for Koopman theory

Physics-informed Deep Koopman Operator for Lagrangian Dynamic Systems

Data-driven End-to-end Learning of Pole Placement Control for Nonlinear Dynamics via Koopman Invariant Subspaces

Learning Koopman Operators with Control Using Bi-level Optimization

Koopman-based Deep Learning for Nonlinear System Estimation

Learning Koopman-based Stability Certificates for Unknown Nonlinear Systems

Diffeomorphically Learning Stable Koopman Operators

Learning Bilinear Models of Actuated Koopman Generators from Partially-Observed Trajectories

Koopman-Assisted Reinforcement Learning

Deep Koopman Learning of Nonlinear Time-Varying Systems

Enhancing predictive capabilities in data-driven dynamical modeling with automatic differentiation: Koopman and neural ODE approaches

Data-driven optimal control of unknown nonlinear dynamical systems using the Koopman operator

Learning Stable Koopman Embeddings for Identification and Control

A Convex Optimization Approach to Learning Koopman Operators

Online Real-time Learning of Dynamical Systems from Noisy Streaming Data: A Koopman Operator Approach

Learning Bilinear Models of Actuated Koopman Generators from Partially Observed Trajectories