Abstract:Reinforcement learning (RL) constitutes an effective method of controlling dynamic systems without prior knowledge. One of the most important and difficult problems in RL is the improvement of data efficiency. Probabilistic inference for learning control (PILCO) is a state-of-the-art data-efficient framework that uses a Gaussian process to model dynamic systems. However, it only focuses on optimizing cumulative rewards and does not consider the accuracy of a dynamic model, which is an important factor for controller learning. To further improve the data efficiency of PILCO, we propose its active exploration version (AEPILCO) that utilizes information entropy to describe samples. In the policy evaluation stage, we incorporate an information entropy criterion into long-term sample prediction. Through the informative policy evaluation function, our algorithm obtains informative policy parameters in the policy improvement stage. Using the policy parameters in the actual execution produces an informative sample set; this is helpful in learning an accurate dynamic model. Thus, the AEPILCO algorithm improves data efficiency by learning an accurate dynamic model by actively selecting informative samples based on the information entropy criterion. We demonstrate the validity and efficiency of the proposed algorithm for several challenging controller problems involving a cart pole, a pendubot, a double pendulum, and a cart double pendulum. The AEPILCO algorithm can learn a controller using fewer trials compared to PILCO. This is verified through theoretical analysis and experimental results.

Active Exploration Planning in Reinforcement Learning for Inverted Pendulum System Control

Limit cycles in inverted pendulum system by reinforcement learning

The Negative Effect on the Control of Inverted Pendulum Caused by the Limit Cycle in Reinforcement Learning

Model-Based Robot Learning Control with Uncertainty Directed Exploration

Control of the Double Inverted Pendulum Based on Reinforcement Learning

Swing-up and Balance Control of Inverted Pendulum Based on Reinforcement Learning

Design Of Reinforcement Learning Algorathm For Single Inverted Pendulum Swing Control

Application of Adaptive Reinforcement Learning for State Space Construction

Adaptive Control of an Inverted Pendulum by a Reinforcement Learning-based LQR Method

Active Exploration Deep Reinforcement Learning for Continuous Action Space with Forward Prediction

A Q-learning approach to the continuous control problem of robot inverted pendulum balancing

Vague Neural Network Based Reinforcement Learning Control System For Inverted Pendulum

Design Of Reinforce Learning Control Algorithm And Verified In Inverted Pendulum

Learning-based Model Predictive Control for Safe Exploration and Reinforcement Learning

Design and Experiment of Variable Structure Controlle for Double Inverted Pendulum

An Active Exploration Method for Data Efficient Reinforcement Learning

Solve the Inverted Pendulum Problem Base on DQN Algorithm

Balance Controller Design for Inverted Pendulum Considering Detail Reward Function and Two-Phase Learning Protocol

An Optimization Method for the Inverted Pendulum Problem Based on Deep Reinforcement Learning

A Kernel-Based Reinforcement Learning Approach To Stochastic Pole Balancing Control Systems

RTP-Q: a Reinforcement Learning System with an Active Exploration Planning Structure for Enhancing the Convergence Rate