3D Object Segmentation Using Cross-Window Point Transformer with Latent Semantic Boundary Guidance

Qide Wang,Daxin Liu,Zhenyu Liu,Jiatong Xu,Jianrong Tan
DOI: https://doi.org/10.1109/tmm.2023.3342697
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Accurate 3D object segmentation in point clouds is a basis for industrial robot applications, such as robot manipulation and digital twin, which require an understanding of the 3D environment. However, the unstructured and disordered nature of point clouds makes it challenging, especially for the incomplete 3D data under a single view in the real-world scenario. To this end, this paper proposes a novel 3D object segmentation framework (3DT-Seg) based on Cross-Window Point Transformer (CP-Former). CP-Former captures the long-range dependencies between local windows and latent semantic boundaries to enhance the point-wise features extracted from irregular point clouds via a bidirectional cross-attention mechanism. In addition, a contrastive learning loss and an adaptive dual aggregation strategy are introduced on semantic transition regions during the semantic supervising and instance clustering process, respectively. In this way, the latent boundary information is further utilized to improve the overall segmentation performance. Experiments on the popular benchmark (SI3DS) dataset show the state-of-the-art performance of the proposed approach in terms of semantic and instance segmentation. Furthermore, a real-world point cloud dataset (IP-Cloud) for the robotic grasping task is presented to fully validate the effectiveness of our method in practice, where it also achieves remarkable performance.
What problem does this paper attempt to address?