Multispectral Object Detection Based on Multilevel Feature Fusion and Dual Feature Modulation

Jin Sun,Mingfeng Yin,Zhiwei Wang,Tao Xie,Shaoyi Bei
DOI: https://doi.org/10.3390/electronics13020443
IF: 2.9
2024-01-22
Electronics
Abstract:Multispectral object detection is a crucial technology in remote sensing image processing, particularly in low-light environments. Most current methods extract features at a single scale, resulting in the fusion of invalid features and the failure to detect small objects. To address these issues, we propose a multispectral object detection network based on multilevel feature fusion and dual feature modulation (GMD-YOLO). Firstly, a novel dual-channel CSPDarknet53 network is used to extract deep features from visible-infrared images. This network incorporates a Ghost module, which generates additional feature maps through a series of linear operations, achieving a balance between accuracy and speed. Secondly, the multilevel feature fusion (MLF) module is designed to utilize cross-modal information through the construction of hierarchical residual connections. This approach strengthens the complementarity between different modalities, allowing the network to improve multiscale representation capabilities at a more refined granularity level. Finally, a dual feature modulation (DFM) decoupling head is introduced to enhance small object detection. This decoupled head effectively meets the distinct requirements of classification and localization tasks. GMD-YOLO is validated on three public visible-infrared datasets: DroneVehicle, KAIST, and LLVIP. DroneVehicle and LLVIP achieved mAP@0.5 of 78.0% and 98.0%, outperforming baseline methods by 3.6% and 4.4%, respectively. KAIST exhibited an MR of 7.73% with an FPS of 61.7. Experimental results demonstrated that our method surpasses existing advanced methods and exhibits strong robustness.
engineering, electrical & electronic,computer science, information systems,physics, applied
What problem does this paper attempt to address?
The problems that this paper attempts to solve are as follows: In low - light environments (such as at night), existing multi - spectral object detection methods usually extract features only at a single scale, resulting in the fusion of invalid features and difficulty in detecting small objects. Specifically: 1. **Limitations of single - scale feature extraction**: Most existing methods extract features only at one scale, which will lead to the fusion of invalid features and ineffective detection of small objects. 2. **Insufficient cross - modal information complementarity**: Existing methods often overlook the complementarity of cross - modal information at different levels, affecting the performance of the model. 3. **Conflict between classification and localization tasks**: Although the decoupled heads can provide independent feature information for classification and localization tasks, they are applied to the same input features, which may lead to false or missed detections of small objects. To overcome these problems, the authors propose a multi - spectral object detection network (GMD - YOLO) based on multi - level feature fusion and dual - feature modulation. This method aims to improve the accuracy and robustness of object detection in low - light environments in the following ways: - **Dual - channel CSPDarknet53 network**: Use a new dual - channel CSPDarknet53 network to efficiently extract deep features from visible - light and infrared images. This network introduces the Ghost module, which generates additional feature maps through a series of linear operations, achieving a balance between accuracy and speed. - **Multi - level feature fusion (MLF) module**: Design a multi - level feature fusion module. By constructing hierarchical residual connections to utilize cross - modal information, it enhances the complementarity between different modalities, enabling the network to improve multi - scale representation ability at a finer granularity level. - **Dual - feature modulation (DFM) decoupled head**: Introduce a new dual - feature modulation decoupled head to enhance the ability to detect small objects. This decoupled head can effectively meet the different requirements of classification and localization tasks. Experimental results show that GMD - YOLO outperforms the baseline methods on three publicly available visible - infrared datasets (DroneVehicle, KAIST, and LLVIP), especially having a significant advantage in small - object detection. In summary, this paper mainly solves the feature extraction and fusion problems in multi - spectral object detection in low - light environments of existing methods, and improves the detection performance through improved network structure and module design.