MSIT-Det: Multi-Scale Feature Aggregation with Iterative Transformer Networks for 3D Object Detection

Xi Li,Yuanyuan Chen,Yisheng Lv
DOI: https://doi.org/10.1109/itsc57777.2023.10422272
2023-01-01
Abstract:LiDAR-based perception is pivotal for ensuring the safety of autonomous driving. Despite numerous detection methods being continually optimized for both timeliness and accuracy, there is room for improvement. This paper introduces MSIT-Det, an innovative two-stage 3D object detection framework designed explicitly for LiDAR point cloud data. MSIT-Det distinguishes itself by emphasizing on proposal regions and exploiting a multi-scale graph structured feature aggregation (MSGA) to extract graph geometric information across diverse scales. To further enhance feature expression, we propose the iterative Transformer networks (ITNet), which integrates attention to concreteness, attention to abstraction, and unified feature representation modules. To optimize our framework, we incorporate parallel loss functions to simultaneously refine each scale and the final output. Experimental results on the KITTI dataset demonstrate the effectiveness of MSIT- Det, showing promising performance compared to existing methods in terms of detection accuracy and efficiency.
What problem does this paper attempt to address?