Abstract:With sufficient practice, humans can grab objects they have never seen before through brain decision-making. However, the manipulators, which has a wide range of applications in industrial production, can still only grab specific objects. Because most of the grasp algorithms rely on prior knowledge such as hand-eye calibration results, object model features, and can only target specific types of objects. When the task scenario and the operation target change, it cannot perform effective redeployment. In order to solve the above problems, academia often uses reinforcement learning to train grasping algorithms. However, the method of reinforcement learning in the field of manipulators grasping mainly encounters these main problems: insufficient sample utilization, poor algorithm stability, and limited exploration. This article uses LfD, BC, and DDPG to improve sample utilization. Use multiple critics to integrate and evaluate input actions to solve the problem of algorithm instability. Finally, inspired by Thompson's sampling idea, the input action is evaluated from different angles, which increases the algorithm's exploration of the environment and reduces the number of interactions with the environment. EDDPG and EBDDPG algorithm is designed in the article. In order to further improve the generalization ability of the algorithm, this article does not use extra information that is difficult to obtain directly on the physical platform, such as the real coordinates of the target object and the continuous motion space at the end of the manipulator in the Cartesian coordinate system is used as the output of the decision. The simulation results show that, under the same number of interactions, the manipulators' success rate in grabbing 1000 random objects has increased more than double and reached state-of-the-art(SOTA) performance.

An Experience-Based Policy Gradient Method for Smooth Manipulation

Ensemble Bootstrapped Deep Deterministic Policy Gradient For Vision-Based Robotic Grasping

Generalize Robot Learning from Demonstration to Variant Scenarios with Evolutionary Policy Gradient

A Policy Gradient Algorithm Integrating Long and Short-Term Rewards for Soft Continuum Arm Control

A Manipulator Control Method Based on Deep Deterministic Policy Gradient with Parameter Noise

A novel policy gradient algorithm with PSO-based parameter exploration for continuous control

Robot Grasping Method Optimization Using Improved Deep Deterministic Policy Gradient Algorithm of Deep Reinforcement Learning.

Optimization of Robotic Arm Grasping through Fractional-Order Deep Deterministic Policy Gradient Algorithm

Policy ensemble gradient for continuous control problems in deep reinforcement learning

Training Efficient Controllers via Analytic Policy Gradient

Reward-Adaptive Reinforcement Learning: Dynamic Policy Gradient Optimization for Bipedal Locomotion

Biped Robots Control in Gusty Environments with Adaptive Exploration Based DDPG

Asynchronous Episodic Deep Deterministic Policy Gradient: Toward Continuous Control in Computationally Complex Environments

Hybrid and dynamic policy gradient optimization for bipedal robot locomotion

Model-Based Ddpg for Motor Control

Continuous Shared Control in Prosthetic Hand Grasp Tasks by Deep Deterministic Policy Gradient with Hindsight Experience Replay

Robot Policy Improvement With Natural Evolution Strategies for Stable Nonlinear Dynamical System

Residual Policy Learning for Perceptive Quadruped Control Using Differentiable Simulation

Feedback Deep Deterministic Policy Gradient With Fuzzy Reward for Robotic Multiple Peg-in-Hole Assembly Tasks

Deep Deterministic Policy Gradient with Episode Experience Replay

Moving Object Grasping Method of Mechanical Arm Based on Deep Deterministic Policy Gradient and Hindsight Experience Replay