Abstract:With sufficient practice, humans can grab objects they have never seen before through brain decision-making. However, the manipulators, which has a wide range of applications in industrial production, can still only grab specific objects. Because most of the grasp algorithms rely on prior knowledge such as hand-eye calibration results, object model features, and can only target specific types of objects. When the task scenario and the operation target change, it cannot perform effective redeployment. In order to solve the above problems, academia often uses reinforcement learning to train grasping algorithms. However, the method of reinforcement learning in the field of manipulators grasping mainly encounters these main problems: insufficient sample utilization, poor algorithm stability, and limited exploration. This article uses LfD, BC, and DDPG to improve sample utilization. Use multiple critics to integrate and evaluate input actions to solve the problem of algorithm instability. Finally, inspired by Thompson's sampling idea, the input action is evaluated from different angles, which increases the algorithm's exploration of the environment and reduces the number of interactions with the environment. EDDPG and EBDDPG algorithm is designed in the article. In order to further improve the generalization ability of the algorithm, this article does not use extra information that is difficult to obtain directly on the physical platform, such as the real coordinates of the target object and the continuous motion space at the end of the manipulator in the Cartesian coordinate system is used as the output of the decision. The simulation results show that, under the same number of interactions, the manipulators' success rate in grabbing 1000 random objects has increased more than double and reached state-of-the-art(SOTA) performance.

Non-Markov Policies to Reduce Sequential Failures in Robot Bin Picking

Failure-aware Policy Learning for Self-assessable Robotics Tasks

Ensemble Bootstrapped Deep Deterministic Policy Gradient For Vision-Based Robotic Grasping

Learning Efficient and Fair Policies for Uncertainty-Aware Collaborative Human-Robot Order Picking

Diff-DAgger: Uncertainty Estimation with Diffusion Policy for Robotic Manipulation

Multi-Object Grasping -- Generating Efficient Robotic Picking and Transferring Policy

Online Tool Selection with Learned Grasp Prediction Models

Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress

Sequential Dexterity: Chaining Dexterous Policies for Long-Horizon Manipulation

Online augmentation of learned grasp sequence policies for more adaptable and data-efficient in-hand manipulation

Single-Shot Learning of Stable Dynamical Systems for Long-Horizon Manipulation Tasks

Feedback Deep Deterministic Policy Gradient With Fuzzy Reward for Robotic Multiple Peg-in-Hole Assembly Tasks

DBPF: A Framework for Efficient and Robust Dynamic Bin-Picking

Learning Practically Feasible Policies for Online 3D Bin Packing

Learning Efficient Policies for Picking Entangled Wire Harnesses: An Approach to Industrial Bin Picking

Operational Policies and Performance Analysis for Overhead Robotic Compact Warehousing Systems with Bin Reshuffling

A Minibatch Stochastic Gradient Descent-Based Learning Metapolicy for Inventory Systems with Myopic Optimal Policy

Optimal decision making in robotic assembly and other trial-and-error tasks

Off-Policy Deep Reinforcement Learning Algorithms for Handling Various Robotic Manipulator Tasks

How Generalizable Is My Behavior Cloning Policy? A Statistical Approach to Trustworthy Performance Evaluation