Channel-wise and Spatially-Guided Multimodal Feature Fusion Network for 3D Object Detection in Autonomous Vehicles

Muhammad Uzair,Jian Dong,Ronghua Shi,Husnain Mushtaq,Irshad Ullah
DOI: https://doi.org/10.1109/tgrs.2024.3476072
IF: 8.2
2024-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Accurate 3D object detection is vital in autonomous driving. Traditional LiDAR models struggle with sparse point clouds. We propose a novel approach integrating LiDAR and camera data to maximize sensor strengths while overcoming individual limitations for enhanced 3D object detection. Our research introduces the Channel-wise and Spatially-guided Multimodal feature fusion network (CSMNET) for 3D Object Detection. First, our method enhances LiDAR data by projecting it onto a 2D plane, enabling the extraction of class-specific features from a probability map. Second, we design class-based farthest point sampling (C-FPS), which boosts the selection of foreground points by utilizing point weights based on geometric or probability features while ensuring diversity among the selected points. Third, we developed a parallel attention-based multimodal fusion mechanism achieving higher resolution compared to raw LiDAR points. This fusion mechanism integrates two attention mechanisms: channel attention for LiDAR data and spatial attention for camera data. These mechanisms enhance the utilization of semantic features in a region of interest (ROI) to obtain more representative point features, leading to more effective fusion of information from both LiDAR and camera sources. Specifically, CSMNET achieves an Average Precision (AP) in Bird’s Eye View (BEV) detection of 90.16% (easy), 85.18% (moderate), and 80.51% (hard), with a mean AP (mAP) of 85.12%. In 3D detection, CSMNET attains 82.05% (easy), 72.64% (moderate), and 67.10% (hard) with a mAP of 73.75%. For 2D detection, the scores are 95.47% (easy), 93.25% (moderate), and 86.68% (hard), yielding a mAP of 91.72% for the KITTI dataset.
What problem does this paper attempt to address?