Abstract:The tasks with continuous state and action spaces are difficult to be solved with high sample efficiency. Model learning and planning, as a well-known method to improve the sample efficiency, is achieved by learning a system dynamics model first and then using it for planning. However, the convergence of the algorithm will be slowed if the system dynamics model is not captured accurately, with the consequence of low sample efficiency. Therefore, to solve the problems with continuous state and action spaces, a model-learning-based actor-critic algorithm with the Gaussian process approximator is proposed, named MLAC-GPA, where the Gaussian process is selected as the modeling method due to its valuable characteristics of capturing the noise and uncertainty of the underlying system. The model in MLAC-GPA is firstly represented by linear function approximation and then modeled by the Gaussian process. Afterward, the expectation value vector and the covariance matrix of the model parameter are estimated by Bayesian reasoning. The model is used for planning after being learned, to accelerate the convergence of the value function and the policy. Experimentally, the proposed method MLAC-GPA is implemented and compared with five representative methods in three classic benchmarks, Pole Balancing, Inverted Pendulum, and Mountain Car. The result shows MLAC-GPA overcomes the others both in learning rate and sample efficiency.

Adaptive Critic Design with Local Gaussian Process Models

Two-Phase Iteration for Value Function Approximation and Hyperparameter Optimization in Gaussian-Kernel-Based Adaptive Critic Design

Gaussian-kernel-based Adaptive Critic Design Using Two-Phase Value Iteration.

Active Design of Dynamic GP Models for Model Predictive Control Using Expected Improvement

Modeling-Learning-Based Actor-Critic Algorithm with Gaussian Process Approximator

Actor-critic algorithm based on Gaussian process

Online Sparse Kernel Learning-Based Adaptive Dynamic Programming

A Method of Adaptive Hyperparameter Optimization for Deep Generative Models

A CUDA-based Parallel Adaptive Dynamic Programming Algorithm

Model-free Adaptive Dynamic Programming for Optimal Control of Discrete-time Affine Nonlinear System

Adaptive Critic Design with Graph Laplacian for Online Learning Control of Nonlinear Systems

Tracking Control of Affine Nonlinear Discrete-Time Systems Based on Gaussian-kernel-based ADP

Twin Deterministic Policy Gradient Adaptive Dynamic Programming for Optimal Control of Affine Nonlinear Discrete-time Systems

Deterministic Policy Gradient Adaptive Dynamic Programming for Model-Free Optimal Control

A Global-Local Approximation Framework for Large-Scale Gaussian Process Modeling

Distributed Gaussian Processes Hyperparameter Optimization for Big Data Using Proximal ADMM

Kernel-based Adaptive Critic Designs for Optimal Control of Nonlinear Discrete-time System

Finite-approximation-error-based Discrete-Time Iterative Adaptive Dynamic Programming.

A Single-NN Iterative Adaptive Dynamic Programming Algorithm for Continuous-Time Nonlinear Zero-Sum Games

Hamiltonian-Driven Adaptive Dynamic Programming for Continuous Nonlinear Dynamical Systems

Real-Time Adaptive Safety-Critical Control with Gaussian Processes in High-Order Uncertain Models