Learning Generalizable Feature Fields for Mobile Manipulation

Ri-Zhao Qiu,Yafei Hu,Ge Yang,Yuchen Song,Yang Fu,Jianglong Ye,Jiteng Mu,Ruihan Yang,Nikolay Atanasov,Sebastian Scherer,Xiaolong Wang
2024-03-12
Abstract:An open problem in mobile manipulation is how to represent objects and scenes in a unified manner, so that robots can use it both for navigating in the environment and manipulating objects. The latter requires capturing intricate geometry while understanding fine-grained semantics, whereas the former involves capturing the complexity inherit to an expansive physical scale. In this work, we present GeFF (Generalizable Feature Fields), a scene-level generalizable neural feature field that acts as a unified representation for both navigation and manipulation that performs in real-time. To do so, we treat generative novel view synthesis as a pre-training task, and then align the resulting rich scene priors with natural language via CLIP feature distillation. We demonstrate the effectiveness of this approach by deploying GeFF on a quadrupedal robot equipped with a manipulator. We evaluate GeFF's ability to generalize to open-set objects as well as running time, when performing open-vocabulary mobile manipulation in dynamic scenes.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?