RPV-CASNet: range-point-voxel integration with channel self-attention network for lidar point cloud segmentation

Jiajiong Li,Chuanxu Wang,Chenyang Wang,Min Zhao,Zitai Jiang
DOI: https://doi.org/10.1007/s10489-024-05553-4
IF: 5.3
2024-06-16
Applied Intelligence
Abstract:Maximizing the advantages of different views and mitigating their respective disadvantages in fine-grained segmentation tasks are an important challenge in the field of point cloud multi-view fusion. Traditional multi-view fusion methods ignore two fatal problems: 1. the loss of depth and quantization information due to mapping and voxelization operations, resulting in "anomalies" in the extracted features; 2. how to pay attention to the large differences in object sizes among different views during point cloud learning, and fine-tune the fusion efficiency in order to improve the performance of network. In this paper, we propose a new algorithm that uses c hannel s elf- a ttention to fuse r ange- p oint- v oxel, abbreviated as RPV-CASNet . RPV-CASNet integrates the three different views: range, point and voxel in a more subtle way through an interactive structure ( r ange- p oint- v oxel cross-adaptive l ayer known as RPVLayer for short), to take full advantage of the differences among them. The RPVLayer contains two key designs: the F eature R efinement M odule ( FRM ) and the M ulti- F ine- G rained F eature S elf- A ttention M odule( MFGFSAM ). Specifically, the FRM allows for a re-inference representation of points with entrained anomalous features, correcting the features. The MFGFSAM addresses two challenges: efficiently aggregating tokens from distant regions and preserving multiscale features within a single attention layer. In addition, we design a D ynamic F eature P yramid E xtractor ( DFPE ) for network deployment, which is used to extract rich features from spherical range images. Our method achieves impressive mIoU scores of 69.8% and 77.1% on the SemanticKITTI and nuScenes datasets, respectively.
computer science, artificial intelligence
What problem does this paper attempt to address?