ClusterFusion: Leveraging Radar Spatial Features for Radar-Camera 3D Object Detection in Autonomous Vehicles

Irfan Tito Kurniawan,Bambang Riyanto Trilaksono
DOI: https://doi.org/10.1109/ACCESS.2023.3328953
2023-11-05
Abstract:Thanks to the complementary nature of millimeter wave radar and camera, deep learning-based radar-camera 3D object detection methods may reliably produce accurate detections even in low-visibility conditions. This makes them preferable to use in autonomous vehicles' perception systems, especially as the combined cost of both sensors is cheaper than the cost of a lidar. Recent radar-camera methods commonly perform feature-level fusion which often involves projecting the radar points onto the same plane as the image features and fusing the extracted features from both modalities. While performing fusion on the image plane is generally simpler and faster, projecting radar points onto the image plane flattens the depth dimension of the point cloud which might lead to information loss and makes extracting the spatial features of the point cloud harder. We proposed ClusterFusion, an architecture that leverages the local spatial features of the radar point cloud by clustering the point cloud and performing feature extraction directly on the point cloud clusters before projecting the features onto the image plane. ClusterFusion achieved the state-of-the-art performance among all radar-monocular camera methods on the test slice of the nuScenes dataset with 48.7% nuScenes detection score (NDS). We also investigated the performance of different radar feature extraction strategies on point cloud clusters: a handcrafted strategy, a learning-based strategy, and a combination of both, and found that the handcrafted strategy yielded the best performance. The main goal of this work is to explore the use of radar's local spatial and point-wise features by extracting them directly from radar point cloud clusters for a radar-monocular camera 3D object detection method that performs cross-modal feature fusion on the image plane.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively use millimeter - wave radar and monocular cameras for 3D object detection in autonomous vehicles. Specifically, the authors propose a new architecture named ClusterFusion, aiming to overcome several key challenges in existing radar - camera fusion methods: 1. **Information loss problem**: In traditional methods, radar point clouds are usually projected onto the image plane for feature - level fusion. Although this method is simple and fast, it flattens the depth dimension of the point cloud, resulting in the loss of spatial information, and further affects the effective extraction of local spatial features of the point cloud. 2. **Limitations of feature extraction**: Due to the extreme sparsity of radar point clouds, it is very difficult to directly extract useful features from radar point clouds. Traditional radar - camera fusion methods often have difficulty fully utilizing the rich spatial and point - level information provided by radar point clouds. To address the above challenges, ClusterFusion innovatively solves these problems in the following ways: - **Point cloud clustering**: First, ClusterFusion uses the preliminary 3D object detection results to filter and cluster points in the radar point cloud to form point cloud clusters. This process is completed based on a frustum association mechanism inspired by CenterFusion. - **Direct feature extraction from point cloud clusters**: Next, ClusterFusion directly extracts features from these point cloud clusters without any projection operations. This step can more effectively extract the local spatial features of the point cloud. - **Cross - modal feature fusion on the image plane**: Finally, the extracted radar feature map is projected onto the image plane and fused with the image feature map to generate a fused feature map. These fused feature maps are then sent to the regression head to generate the final 3D object detection results. In this way, ClusterFusion not only maintains the simplicity and speed of feature - level fusion on the image plane, but also can fully utilize the spatial and point - level features of the radar point cloud, thereby achieving state - of - the - art performance on the test slices of the nuScenes dataset, especially outstanding in terms of robustness and accuracy under low - visibility conditions.