Abstract:With sufficient practice, humans can grab objects they have never seen before through brain decision-making. However, the manipulators, which has a wide range of applications in industrial production, can still only grab specific objects. Because most of the grasp algorithms rely on prior knowledge such as hand-eye calibration results, object model features, and can only target specific types of objects. When the task scenario and the operation target change, it cannot perform effective redeployment. In order to solve the above problems, academia often uses reinforcement learning to train grasping algorithms. However, the method of reinforcement learning in the field of manipulators grasping mainly encounters these main problems: insufficient sample utilization, poor algorithm stability, and limited exploration. This article uses LfD, BC, and DDPG to improve sample utilization. Use multiple critics to integrate and evaluate input actions to solve the problem of algorithm instability. Finally, inspired by Thompson's sampling idea, the input action is evaluated from different angles, which increases the algorithm's exploration of the environment and reduces the number of interactions with the environment. EDDPG and EBDDPG algorithm is designed in the article. In order to further improve the generalization ability of the algorithm, this article does not use extra information that is difficult to obtain directly on the physical platform, such as the real coordinates of the target object and the continuous motion space at the end of the manipulator in the Cartesian coordinate system is used as the output of the decision. The simulation results show that, under the same number of interactions, the manipulators' success rate in grabbing 1000 random objects has increased more than double and reached state-of-the-art(SOTA) performance.

Optimizing TD3 for 7-DOF Robotic Arm Grasping: Overcoming Suboptimality with Exploration-Enhanced Contrastive Learning

Ensemble Bootstrapped Deep Deterministic Policy Gradient For Vision-Based Robotic Grasping

Generalize Robot Learning from Demonstration to Variant Scenarios with Evolutionary Policy Gradient

Adaptive trajectory-constrained exploration strategy for deep reinforcement learning

Non-local Policy Optimization via Diversity-regularized Collaborative Exploration

Leveraging the Efficiency of Multi-Task Robot Manipulation Via Task-Evoked Planner and Reinforcement Learning

A Novel Robotic Grasping Method for Moving Objects Based on Multi-Agent Deep Reinforcement Learning

Improved Exploration through Latent Trajectory Optimization in Deep Deterministic Policy Gradient

ACDER: Augmented Curiosity-Driven Experience Replay

Learning Visual Robotic Control Efficiently with Contrastive Pre-training and Data Augmentation

Efficiently Training On-Policy Actor-Critic Networks in Robotic Deep Reinforcement Learning with Demonstration-like Sampled Exploration

Optimization of Robotic Arm Grasping through Fractional-Order Deep Deterministic Policy Gradient Algorithm

Optimizing Deep Reinforcement Learning for Adaptive Robotic Arm Control

PP-PG: Combining Parameter Perturbation with Policy Gradient Methods for Effective and Efficient Explorations in Deep Reinforcement Learning

Mitigating Estimation Errors by Twin TD-Regularized Actor and Critic for Deep Reinforcement Learning

Goal-Auxiliary Actor-Critic for 6D Robotic Grasping with Point Clouds

Inspection Robot Navigation Based on Improved TD3 Algorithm

Learning Complicated Manipulation Skills via Deterministic Policy with Limited Demonstrations

Evolutionary Action Selection for Gradient-based Policy Learning

Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions

A Modified Convergence DDPG Algorithm for Robotic Manipulation