SCKD: Semi-Supervised Cross-Modality Knowledge Distillation for 4D Radar Object Detection

Ruoyu Xu,Zhiyu Xiang,Chenwei Zhang,Hanzhi Zhong,Xijun Zhao,Ruina Dang,Peng Xu,Tianyu Pu,Eryun Liu
2024-12-19
Abstract:3D object detection is one of the fundamental perception tasks for autonomous vehicles. Fulfilling such a task with a 4D millimeter-wave radar is very attractive since the sensor is able to acquire 3D point clouds similar to Lidar while maintaining robust measurements under adverse weather. However, due to the high sparsity and noise associated with the radar point clouds, the performance of the existing methods is still much lower than expected. In this paper, we propose a novel Semi-supervised Cross-modality Knowledge Distillation (SCKD) method for 4D radar-based 3D object detection. It characterizes the capability of learning the feature from a Lidar-radar-fused teacher network with semi-supervised distillation. We first propose an adaptive fusion module in the teacher network to boost its performance. Then, two feature distillation modules are designed to facilitate the cross-modality knowledge transfer. Finally, a semi-supervised output distillation is proposed to increase the effectiveness and flexibility of the distillation framework. With the same network structure, our radar-only student trained by SCKD boosts the mAP by 10.38% over the baseline and outperforms the state-of-the-art works on the VoD dataset. The experiment on ZJUODset also shows 5.12% mAP improvements on the moderate difficulty level over the baseline when extra unlabeled data are available. Code is available at <a class="link-external link-https" href="https://github.com/Ruoyu-Xu/SCKD" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Artificial Intelligence,Image and Video Processing
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of poor 3D object detection performance based on 4D millimeter - wave radar. Specifically, the existing methods based on 4D radar have a performance far below expectations due to the high sparsity and noise problems of point clouds. Although 4D radar performs well in bad weather conditions and can provide 3D point cloud data similar to Lidar, its point cloud density is only one - tenth of that of Lidar, and there are "ghost points" (caused by the multipath effect), which greatly reduces its measurement accuracy. To solve these problems, the paper proposes a new method - **SCKD (Semi - Supervised Cross - modality Knowledge Distillation)**, that is, a semi - supervised cross - modality knowledge distillation method. Through this method, the author hopes to significantly improve the 3D object detection performance based on 4D radar while maintaining real - time performance. The core idea of SCKD is to learn features from a multi - modal fusion teacher network and transfer them to a student network that only uses radar data, thereby enhancing the detection ability of the student. ### Main contributions of SCKD 1. **Proposing a novel semi - supervised cross - modality distillation framework**: By learning knowledge from the teacher network, the simple student network can greatly improve its performance while maintaining real - time efficiency. 2. **Designing an adaptive fusion module**: Embedded in the teacher network to fuse the features of Lidar and radar, thereby improving the performance of the teacher network and reducing the difficulty of knowledge transfer. 3. **Proposing two feature distillation modules**: Namely Lidar - to - radar feature distillation (LRFD) and fusion - to - radar feature distillation (FRFD) respectively, to enhance the effect of feature distillation. 4. **Introducing semi - supervised output distillation (SSOD)**: No longer requiring the ground - truth supervision of the student network, thereby improving the flexibility of the method and being able to utilize a large amount of unlabeled data. 5. **Experimental verification**: Extensive experiments on the VoD and ZJUODset datasets show that SCKD significantly outperforms existing methods, especially in the case of a large amount of unlabeled data. ### Key technologies of the solution - **Teacher network**: Adopting a Lidar - Radar dual - modal fusion network, including an adaptive fusion module, for generating richer semantic information. - **Feature distillation**: Transferring the features of the teacher network to the student network through the LRFD and FRFD modules. - **Semi - supervised output distillation**: Using the predictions of the teacher network as supervision signals, reducing the dependence on expensive labeled data. Through these technological innovations, SCKD not only improves the 3D object detection performance based on 4D radar, but also shows its potential in practical applications, especially in the field of autonomous driving.