Abstract:Robotic grasping in the open world is a critical component of manufacturing and automation processes. While numerous existing approaches depend on 2D segmentation output to facilitate the grasping procedure, accurately determining depth from 2D imagery remains a challenge, often leading to limited performance in complex stacking scenarios. In contrast, techniques utilizing 3D point cloud data inherently capture depth information, thus enabling adeptly navigating and manipulating a diverse range of complex stacking scenes. However, such efforts are considerably hindered by the variance in data capture devices and the unstructured nature of the data, which limits their generalizability. Consequently, much research is narrowly concentrated on managing designated objects within specific settings, which confines their real-world applicability. This paper presents a novel pipeline capable of executing object grasping tasks in open-world scenarios even on previously unseen objects without the necessity for training. Additionally, our pipeline supports the flexible use of different 3D point cloud segmentation models across a variety of scenes. Leveraging the segmentation results, we propose to engage a training-free binary clustering algorithm that not only improves segmentation precision but also possesses the capability to cluster and localize unseen objects for executing grasping operations. In our experiments, we investigate a range of open-world scenarios, and the outcomes underscore the remarkable robustness and generalizability of our pipeline, consistent across various environments, robots, cameras, and objects. The code will be made available upon acceptance of the paper.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to achieve cross - device and training - free robotic grasping tasks in open - world scenarios. Specifically, the authors proposed innovative solutions to the following key challenges: 1. **Depth Information Extraction and Complex Scene Processing**: - Many existing methods rely on 2D image segmentation to assist the grasping process, but in complex stacking scenarios, accurately determining depth information remains a difficult problem. Although 3D point - cloud data can provide rich geometric information (such as surface point coordinates and normals), its irregularity and sparsity limit the generalization ability of these methods. 2. **Cross - device Compatibility**: - Point - cloud data obtained by different devices have precision differences, causing existing methods to be difficult to maintain consistent performance on different hardware. Therefore, how to ensure the stable performance of the algorithm on different devices is an important issue. 3. **Training - free Grasping Ability**: - Many existing robotic grasping methods require a large amount of training data, which not only increases the deployment cost but also limits their application in new environments. Therefore, developing a grasping method that can adapt to new objects and new scenarios without additional training is of great significance. To solve the above problems, the authors proposed a novel pipeline with the following characteristics: - **Cross - device Compatibility**: By optimizing point - cloud feature processing and binary clustering algorithms, this method can run stably on hardware devices with different precisions. - **Training - free**: A binary clustering algorithm based on point density is introduced, which can cluster and locate unseen objects without additional training. - **Strong Generalization Ability**: This method can not only perform well in various environments but also handle complex objects that are occluded or stacked. Through a series of experiments, this method has demonstrated excellent performance under different combinations of robots, cameras, and objects, proving its robustness and generalization ability in open - world scenarios.

Towards Cross-device and Training-free Robotic Grasping in 3D Open World

A Cascaded Deep Learning Framework for Real-time and Robust Grasp Planning

3D Object Segmentation Using Cross-Window Point Transformer with Latent Semantic Boundary Guidance

Efficient and Robust Robotic Grasping in Cluttered Scenes: A Point Cloud-Based Approach with Heuristic Evaluation.

Picking from Clutter: An Object Segmentation Method for Robot Grasping.

Simulation and Deep Learning on Point Clouds for Robot Grasping

A Robotic Semantic Grasping Method for Pick-and-place Tasks

MVGrasp: Real-time multi-view 3D object grasping in highly cluttered environments

Visual Robotic Object Grasping Through Combining RGB-D Data and 3D Meshes.

You Only Scan Once: A Dynamic Scene Reconstruction Pipeline for 6-DoF Robotic Grasping of Novel Objects

Robotic Continuous Grasping System by Shape Transformer-Guided Multi-Object Category-Level 6D Pose Estimation

Multi-scale deep learning and clustering-based tabletop object instance segmentation for robot manipulation

A Novel RGB-D Cross-Background Robot Grasp Detection Dataset and Background-Adaptive Grasping Network

SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images

Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection

Robotic Grasping With Multi-View Image Acquisition and Model-Based Pose Estimation

A Novel Geometry-based Algorithm for Robust Grasping in Extreme Clutter Environment

Two-stage Grasp Detection Method for Robotics Using Point Clouds and Deep Hierarchical Feature Learning Network

GoalGrasp: Grasping Goals in Partially Occluded Scenarios without Grasp Training

Dynamic Open-Vocabulary 3D Scene Graphs for Long-term Language-Guided Mobile Manipulation

RGBGrasp: Image-based Object Grasping by Capturing Multiple Views during Robot Arm Movement with Neural Radiance Fields