GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields

Yanjie Ze,Ge Yan,Yueh-Hua Wu,Annabella Macaluso,Yuying Ge,Jianglong Ye,Nicklas Hansen,Li Erran Li,Xiaolong Wang
2024-07-28
Abstract:It is a long-standing problem in robotics to develop agents capable of executing diverse manipulation tasks from visual observations in unstructured real-world environments. To achieve this goal, the robot needs to have a comprehensive understanding of the 3D structure and semantics of the scene. In this work, we present $\textbf{GNFactor}$, a visual behavior cloning agent for multi-task robotic manipulation with $\textbf{G}$eneralizable $\textbf{N}$eural feature $\textbf{F}$ields. GNFactor jointly optimizes a generalizable neural field (GNF) as a reconstruction module and a Perceiver Transformer as a decision-making module, leveraging a shared deep 3D voxel representation. To incorporate semantics in 3D, the reconstruction module utilizes a vision-language foundation model ($\textit{e.g.}$, Stable Diffusion) to distill rich semantic information into the deep 3D voxel. We evaluate GNFactor on 3 real robot tasks and perform detailed ablations on 10 RLBench tasks with a limited number of demonstrations. We observe a substantial improvement of GNFactor over current state-of-the-art methods in seen and unseen tasks, demonstrating the strong generalization ability of GNFactor. Our project website is <a class="link-external link-https" href="https://yanjieze.com/GNFactor/" rel="external noopener nofollow">this https URL</a> .
Robotics,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?