Reimagining 3D Visual Grounding: Instance Segmentation and Transformers for Fragmented Point Cloud Scenarios.

Zehan Tan,Weidong Yang,Zhiwei Wang
DOI: https://doi.org/10.1145/3595916.3626405
2023-01-01
Abstract:This work introduces a pioneering, engineerable approach to 3D visual localization(3DVG). Current challenges for 2D Visual Grounding (2DVG) and 3DVG are summarized: Absence of Depth Information in 2DVG, Memory and Computational Demands of Global Point Clouds, Limitations in Dynamic Scenarios, and Limited Understanding of Spatial Localization Reference Frames. Our solution proposes a Re_3DVG method for fragmented point cloud scenarios. Utilizing instance segmentation and transformer models, our approach offers a potent mechanism for establishing robust correspondences between text queries and object instances within the shared visible range. The introduction of a FragCloud3DRef dataset, grounded in ScanNet and supplemented with RGB data, object segmentation, and textual descriptions, fortifies the effectiveness of our proposed model. Experimental outcomes display that our model excels beyond conventional 3DVG and 2DVG models, establishing a formidable benchmark for future research within this discipline. The code source and dataset are open at https://github.com/zehantan6970/Reimagining_3DVG.
What problem does this paper attempt to address?