Abstract:With sufficient practice, humans can grab objects they have never seen before through brain decision-making. However, the manipulators, which has a wide range of applications in industrial production, can still only grab specific objects. Because most of the grasp algorithms rely on prior knowledge such as hand-eye calibration results, object model features, and can only target specific types of objects. When the task scenario and the operation target change, it cannot perform effective redeployment. In order to solve the above problems, academia often uses reinforcement learning to train grasping algorithms. However, the method of reinforcement learning in the field of manipulators grasping mainly encounters these main problems: insufficient sample utilization, poor algorithm stability, and limited exploration. This article uses LfD, BC, and DDPG to improve sample utilization. Use multiple critics to integrate and evaluate input actions to solve the problem of algorithm instability. Finally, inspired by Thompson's sampling idea, the input action is evaluated from different angles, which increases the algorithm's exploration of the environment and reduces the number of interactions with the environment. EDDPG and EBDDPG algorithm is designed in the article. In order to further improve the generalization ability of the algorithm, this article does not use extra information that is difficult to obtain directly on the physical platform, such as the real coordinates of the target object and the continuous motion space at the end of the manipulator in the Cartesian coordinate system is used as the output of the decision. The simulation results show that, under the same number of interactions, the manipulators' success rate in grabbing 1000 random objects has increased more than double and reached state-of-the-art(SOTA) performance.

REINFORCEMENT LEARNING-BASED HIGH-LEVEL BALL-STEALING STRATEGY FOR ROBOCUP KEEPAWAY

Towards High Level Skill Learning: Learn to Return Table Tennis Ball Using Monte-Carlo Based Policy Gradient Method.

Ensemble Bootstrapped Deep Deterministic Policy Gradient For Vision-Based Robotic Grasping

Learning RoboCup-Keepaway with Kernels

An obstacle avoidance method for robotic arm based on reinforcement learning

Learning To Chase A Ball Efficiently And Smoothly For A Wheeled Robot

Competition-Aware Decision-Making Approach for Mobile Robots in Racing Scenarios

Tactical Reward Shaping: Bypassing Reinforcement Learning with Strategy-Based Goals

Creating a Dynamic Quadrupedal Robotic Goalkeeper with Reinforcement Learning

Stylized Table Tennis Robots Skill Learning with Incomplete Human Demonstrations

Multi-Stage Decision-Making Skill Learning for Soccer Robot

Wheeled Robots playing Chain Catch: Strategies and Evaluation

Design of Strategy Based on Home and Opponent Information in Robot Soccer Competition

Learning of Long-Horizon Sparse-Reward Robotic Manipulator Tasks With Base Controllers

Dynamic Obstacle Avoidance Algorithm for Robot Arm Based on Deep Reinforcement Learning

Two-stage training algorithm for AI robot soccer

Generation a shooting on the walking for soccer simulation 3D league using Q-learning algorithm

Reinforcement Learning with Task Decomposition and Task-Specific Reward System for Automation of High-Level Tasks

Optimal stroke learning with policy gradient approach for robotic table tennis

The Task Decomposition and Dedicated Reward-System-Based Reinforcement Learning Algorithm for Pick-and-Place

Deep Reinforcement Learning Based Robot Arm Manipulation with Efficient Training Data through Simulation