Abstract:Model-Based Reinforcement Learning (MBRL) has been gradually applied in the field of Robot Learning due to its excellent sample efficiency and asymptotic performance. However, for high-dimensional learning tasks in complex scenes, the exploration and stable training capabilities of the robot still need enhancement. In light of policy planning and policy optimization, we propose a bidirectional model-based policy optimization algorithm based on adaptive gaussian noise and improved confidence weights (BMPO-NW). The algorithm parameterizes bidirectional policy networks into noise networks by adding different adaptive Gaussian noises to the connection weights and biases. This can improve the randomness of policy search and induce efficient exploration for the robot. Simultaneously, the confidence weight of improved activation function is introduced into the Q-function update formula of SAC, which can reduce the error propagation problem of target Q-network, and enhance the robot’s training stability. Finally, we implement the improved algorithm based on the framework of bidirectional model-based policy optimization algorithm (BMPO) to ensure asymptotic performance and sample efficiency. Experimental results in MuJoCo benchmark environments demonstrate that the learning speed of BMPO-NW is about 20% higher than baseline methods, the average reward is about 15% higher than other MBRL methods, and 50%-70% higher than MFRL methods, while the training process is more stable. Ablation experiments and different variant design experiments further verify the feasibility and robustness. The research results provide theoretical support for the conclusion of this paper and hold significant practical value for MBRL to help the robot realize applications in complex scenarios.

Cooperative Q-Learning Based On Maturity Of The Policy

Learning to Cooperate: Application of Deep Reinforcement Learning for Online AGV Path Finding.

Learning Observation-Based Certifiable Safe Policy for Decentralized Multi-Robot Navigation

Multi-Agent Path Finding Method Based on Evolutionary Reinforcement Learning

Learning Effective Communication for Cooperative Pursuit with Multi-Agent Reinforcement Learning

Cooperative Flocking And Learning In Multi-Robot Systems For Predator Avoidance

Q-CP: Learning Action Values for Cooperative Planning

Reinforcement learning for encouraging cooperation in a multiagent system

Compound Heuristic Information Guided Policy Improvement for Robot Motor Skill Acquisition

An off-policy multi-agent stochastic policy gradient algorithm for cooperative continuous control

Edge-conditioned vector basis functions for the analysis and optimization of rectangular waveguide dual-mode filters

QDAP: Downsizing adaptive policy for cooperative multi-agent reinforcement learning

Adaptive Individual Q-Learning-A Multiagent Reinforcement Learning Method for Coordination Optimization

Robot Policy Improvement With Natural Evolution Strategies for Stable Nonlinear Dynamical System

Multi-robot Cooperative Navigation Method based on Multi-agent Reinforcement Learning in Sparse Reward Tasks

Multiple rewards fuzzy reinforcement learning algorithm in RoboCup environment

Improved Q -Learning Method for Multirobot Formation and Path Planning with Concave Obstacles

Cooperative Learning of Multi-Agent Systems Via Reinforcement Learning

A Local Information Aggregation based Multi-Agent Reinforcement Learning for Robot Swarm Dynamic Task Allocation

Bidirectional Model-Based Policy Optimization Based on Adaptive Gaussian Noise and Improved Confidence Weights.

A Policy Gradient Algorithm to Alleviate the Multi-Agent Value Overestimation Problem in Complex Environments