Fast Detection and Obstacle Avoidance on UAVs Using Lightweight Convolutional Neural Network Based on the Fusion of Radar and Camera

Xiyue Wang,Xinsheng Wang,Zhiquan Zhou,Yanhong Song
DOI: https://doi.org/10.1007/s10489-024-05768-5
IF: 5.3
2024-01-01
Applied Intelligence
Abstract:Multi-sensor information fusion (MSIF) technology based on deep convolutional neural networks (CNN) has been widely used in UAV obstacle avoidance. However, detection efficiency needs to be improved in practice because of its high computational complexity and limited airborne hardware resources. A lightweight CNN-based fast detection method based on the fusion of millimeter-wave (MMW) radar and camera is proposed in this paper. In the data preprocessing stage, the input images to the network were preprocessed based on the radar and image data. A rough detection algorithm calculates and segments the saliency image to obtain regions of interest (ROI). The computational complexity of the network training and prediction was reduced by setting the image pixels outside the ROI. In the detection stage based on deep learning, a lightweight network structure based on ResNet18 was designed to fuse the saliency images at different convolution depths. In the post-decoding stage, the calibration points are fused for local non-maximum suppression (NMS). In contrast to typical detection methods, the proposed method improves the detection speed by removing redundant pixels and local NMS, increasing the detection accuracy by fusing the feature information of the radar and camera in multiple stages. The experimental results indicate that compared with the latest lightweight network (YOLOv8n), the detection accuracy is increased by 10.51%, and the FPS increased by 44.43%. Compared with the latest YOLOv8s and YOLOv9m models, the FPS is increased by 3.0X-4.4X. The field programmable gate array (FPGA) implementation achieves a performance of 60.0 FPS. This is an improvement compared with typical methods, demonstrating that the proposed method is more effective than state-of-the-art models.
What problem does this paper attempt to address?