LASO: Language-guided Affordance Segmentation on 3D Object

Yicong Li,Na Zhao,Junbin Xiao,Chun Feng,Xiang Wang,Tat-seng Chua
DOI: https://doi.org/10.1109/cvpr52733.2024.01351
2024-01-01
Abstract:Segmenting affordance in 3D data is key for bridging perception and action in robots. Existing efforts mostly focus on the visual side and overlook the affordance knowledge from a semantic aspect. This oversight not only limits their generalization to unseen objects, but more importantly, hinders their synergy with large language models (LLMs) which are excellent task planners that can decompose an overarching command into agent-actionable instructions. With this regard, we propose a novel task, Language-guided Affordance Segmentation on 3D Object (LASO), which challenges a model to segment a 3D object's part relevant to a given affordance question. To facilitate the task, we contribute a dataset comprising 19,751 point-question pairs, covering 8434 object shapes and 870 expert-crafted questions. As a pioneer solution, we further propose PointRefer, which highlights an adaptive fusion module to identify target affordance regions at different scales. To ensure a text-aware segmentation, we adopt a set of affordance queries conditioned on linguistic cues to generate dynamic kernels. These kernels are further used to convolute with point features and generate a segmentation mask. Comprehensive experiments and analyses validate PointRefer's effectiveness. With these efforts, We hope that LASO can steer the direction of 3D affordance, guiding it towards enhanced integration with the evolving capabilities of LLMs. Code and data are available at https://github.com/yl3800/LASO.
What problem does this paper attempt to address?