Abstract:Model-Based Reinforcement Learning (MBRL) has been gradually applied in the field of Robot Learning due to its excellent sample efficiency and asymptotic performance. However, for high-dimensional learning tasks in complex scenes, the exploration and stable training capabilities of the robot still need enhancement. In light of policy planning and policy optimization, we propose a bidirectional model-based policy optimization algorithm based on adaptive gaussian noise and improved confidence weights (BMPO-NW). The algorithm parameterizes bidirectional policy networks into noise networks by adding different adaptive Gaussian noises to the connection weights and biases. This can improve the randomness of policy search and induce efficient exploration for the robot. Simultaneously, the confidence weight of improved activation function is introduced into the Q-function update formula of SAC, which can reduce the error propagation problem of target Q-network, and enhance the robot’s training stability. Finally, we implement the improved algorithm based on the framework of bidirectional model-based policy optimization algorithm (BMPO) to ensure asymptotic performance and sample efficiency. Experimental results in MuJoCo benchmark environments demonstrate that the learning speed of BMPO-NW is about 20% higher than baseline methods, the average reward is about 15% higher than other MBRL methods, and 50%-70% higher than MFRL methods, while the training process is more stable. Ablation experiments and different variant design experiments further verify the feasibility and robustness. The research results provide theoretical support for the conclusion of this paper and hold significant practical value for MBRL to help the robot realize applications in complex scenarios.

Model-Based Off-Policy Deep Reinforcement Learning with Model-Embedding

Model Embedding Model-Based Reinforcement Learning

Model-Based Reinforcement Learning Via Imagination with Derived Memory.

A survey on model-based reinforcement learning

Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization

Model-Assisted Reinforcement Learning with Adaptive Ensemble Value Expansion

Model-Based Reinforcement Learning via Meta-Policy Optimization

Look Before You Leap: Safe Model-Based Reinforcement Learning with Human Intervention

Sample-Efficient Reinforcement Learning Based on Dynamics Models via Meta-policy Optimization

Benchmarking Model-Based Reinforcement Learning

Physics-informed Dyna-style model-based deep reinforcement learning for dynamic control

Policy-shaped prediction: avoiding distractions in model-based reinforcement learning

Plan to Predict: Learning an Uncertainty-Foreseeing Model for Model-Based Reinforcement Learning.

Model-Based Reinforcement Learning Inspired by Augmented PD for Robotic Control

Bidirectional Model-Based Policy Optimization Based on Adaptive Gaussian Noise and Improved Confidence Weights.

Offline Model-Based Reinforcement Learning with Anti-Exploration

Learning Latent Dynamic Robust Representations for World Models

An Analysis of Model-Based Reinforcement Learning From Abstracted Observations

Algorithmic Framework for Model-based Deep Reinforcement Learning with Theoretical Guarantees

Harmony World Models: Boosting Sample Efficiency for Model-based Reinforcement Learning

Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning