SIANet: 3D object detection with structural information augment network

Jing Zhou,Tengxing Lin,Zixin Gong,Xinhan Huang
DOI: https://doi.org/10.1049/cvi2.12272
IF: 1.484
2024-01-24
IET Computer Vision
Abstract:The authors design a Structural Information Augment (SIA) module to reconstruct the complete shapes of objects within proposals and then integrate the reconstructed structural information into the spatial feature of the object for box refinement. Besides, the authors construct a novel backbone network, which stacks Context‐enhanced Transformer modules and an upsampling branch to capture contextual information efficiently and generate accurate proposals for the SIA module. Extensive experiments on the KITTI and Waymo datasets show that the authors' well‐designed SIANet can effectively improve detection performance. 3D object detection technology from point clouds has been widely applied in the field of automatic driving in recent years. In practical applications, the shape point clouds of some objects are incomplete due to occlusion or far distance, which means they suffer from insufficient structural information. This greatly affects the detection performance. To address this challenge, the authors design a Structural Information Augment (SIA) Network for 3D object detection, named SIANet. Specifically, the authors design a SIA module to reconstruct the complete shapes of objects within proposals for enhancing their geometric features, which are further fused into the spatial feature of the object for box refinement to predict accurate detection boxes. Besides, the authors construct a novel Unet‐liked Context‐enhanced Transformer backbone network, which stacks Context‐enhanced Transformer modules and an upsampling branch to capture contextual information efficiently and generate high‐quality proposals for the SIA module. Extensive experiments show that the authors' well‐designed SIANet can effectively improve detection performance, especially surpassing the baseline network by 1.04% mean Average Precision (mAP) gain in the KITTI dataset and 0.75% LEVEL_2 mAP gain in the Waymo dataset.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?