Abstract:In recent years, the proximal policy optimization (PPO) algorithm has received considerable attention because of its excellent performance in many challenging tasks. However, there is still a large space for theoretical explanation of the mechanism of PPO's horizontal clipping operation, which is a key means to improve the performance of PPO. In addition, while PPO is inspired by the learning theory of trust region policy optimization (TRPO), the theoretical connection between PPO's clipping operation and TRPO's trust region constraint has not been well studied. In this article, we first analyze the effect of PPO's clipping operation on the objective function of conservative policy iteration, and strictly give the theoretical relationship between PPO and TRPO. Then, a novel first-order policy gradient algorithm called authentic boundary PPO (ABPPO) is proposed, which is based on the authentic boundary setting rule. To ensure the difference between the new and old policies is better kept within the clipping range, by borrowing the idea of ABPPO, we proposed two novel improved PPO algorithms called rollback mechanism-based ABPPO (RMABPPO) and penalized point policy difference-based ABPPO (P3DABPPO), which are based on the ideas of rollback clipping and penalized point policy difference, respectively. Experiments on the continuous robotic control tasks implemented in MuJoCo show that our proposed improved PPO algorithms can effectively improve the learning stability and accelerate the learning speed compared with the original PPO.

A Portable Accelerator of Proximal Policy Optimization for Robots

Behavior Proximal Policy Optimization

Fast Proximal Policy Optimization

Proximal Policy Optimization Algorithms

Simulation of Robotic Arm Grasping Control Based on Proximal Policy Optimization Algorithm

Truly Proximal Policy Optimization

Proximal Policy Optimization with Mixed Distributed Training

Proximal policy optimization via enhanced exploration efficiency

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Beyond the Boundaries of Proximal Policy Optimization

Augmented Proximal Policy Optimization for Safe Reinforcement Learning

Meta Proximal Policy Optimization for Cooperative Multi-Agent Continuous Control

Proximal Policy Optimization Smoothed Algorithm

Transductive Off-policy Proximal Policy Optimization

Trust Region-Guided Proximal Policy Optimization

Deep Reinforcement Learning with Enhanced PPO for Safe Mobile Robot Navigation

The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games

The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games

Authentic Boundary Proximal Policy Optimization

Accelerating Proximal Policy Optimization Learning Using Task Prediction for Solving Environments with Delayed Rewards

Mobile Robotic Arm for Opening Doors Using Proximal Policy Optimization