GraspSplats: Efficient Manipulation with 3D Feature Splatting

Mazeyu Ji,Ri-Zhao Qiu,Xueyan Zou,Xiaolong Wang

2024-09-04

Abstract:The ability for robots to perform efficient and zero-shot grasping of object parts is crucial for practical applications and is becoming prevalent with recent advances in Vision-Language Models (VLMs). To bridge the 2D-to-3D gap for representations to support such a capability, existing methods rely on neural fields (NeRFs) via differentiable rendering or point-based projection methods. However, we demonstrate that NeRFs are inappropriate for scene changes due to their implicitness and point-based methods are inaccurate for part localization without rendering-based optimization. To amend these issues, we propose GraspSplats. Using depth supervision and a novel reference feature computation method, GraspSplats generates high-quality scene representations in under 60 seconds. We further validate the advantages of Gaussian-based representation by showing that the explicit and optimized geometry in GraspSplats is sufficient to natively support (1) real-time grasp sampling and (2) dynamic and articulated object manipulation with point trackers. With extensive experiments on a Franka robot, we demonstrate that GraspSplats significantly outperforms existing methods under diverse task settings. In particular, GraspSplats outperforms NeRF-based methods like F3RM and LERF-TOGO, and 2D detection methods.

Robotics,Computer Vision and Pattern Recognition,Machine Learning

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper aims to address the issue of efficient zero-shot grasping and manipulation of object parts by robots in real-world applications. Specifically: 1. **Zero-shot Grasping**: Enabling robots to understand and execute specific tasks, such as a kitchen robot pulling drawers or grabbing tools based on recipe instructions. 2. **Operation in Dynamic Scenes**: Existing methods (such as NeRF-based methods) have limitations in handling scene changes because they require retraining and cannot update in real-time; point-based methods, while efficient, perform poorly under visual occlusion. The paper proposes a new method called GraspSplats, which uses depth supervision and a novel reference feature computation method to generate high-quality scene representations in less than 60 seconds. 3. **Understanding Geometry and Semantics**: To achieve fine manipulation, robots need to understand the geometric structure and semantic information of the scene. GraspSplats supports real-time grasp sampling and dynamic object manipulation through explicit representation. ### Main Contributions 1. Proposes a framework advocating the use of 3D Gaussian Splatting (3DGS) for grasp representation, which offers higher accuracy and efficiency in zero-shot partial grasping compared to existing methods. 2. Implements an editable high-fidelity representation technique that extends beyond zero-shot operations in static scenes to dynamic and articulated object operations. 3. Conducts extensive experiments demonstrating that GraspSplats outperforms NeRF-based and point-based methods in zero-shot grasping in both static and dynamic scenes.

GraspSplats: Efficient Manipulation with 3D Feature Splatting

SparseGrasp: Robotic Grasping via 3D Semantic Gaussian Splatting from Sparse Multi-View RGB Images

GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping

GS3LAM: Gaussian Semantic Splatting SLAM

Splat-MOVER: Multi-Stage, Open-Vocabulary Robotic Manipulation via Editable Gaussian Splatting

GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs

DROID-Splat: Combining end-to-end SLAM with 3D Gaussian Splatting

Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections

Grasp Region Exploration for 7-Dof Robotic Grasping in Cluttered Scenes

GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization

3D Vision-Language Gaussian Splatting

SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction

EasySplat: View-Adaptive Learning makes 3D Gaussian Splatting Easy

RP-SLAM: Real-time Photorealistic SLAM with Efficient 3D Gaussian Splatting

Robust Gaussian Splatting SLAM by Leveraging Loop Closure

TranSplat: Generalizable 3D Gaussian Splatting from Sparse Multi-View Images with Transformers

GGRt: Towards Pose-free Generalizable 3D Gaussian Splatting in Real-time

Next Best Sense: Guiding Vision and Touch with FisherRF for 3D Gaussian Splatting

Learning an End-to-end Spatial Grasp Generation and Refinement Algorithm from Simulation

Gradient-Driven 3D Segmentation and Affordance Transfer in Gaussian Splatting Using 2D Masks

Optimizing 3D Gaussian Splatting for Sparse Viewpoint Scene Reconstruction