FSFM: A Feature Square Tower Fusion Module for Multimodal Object Detection.

Xiaomin Liu,Chen Zhu,Chunyu Yang,Linna Zhou
DOI: https://doi.org/10.1109/TIM.2023.3244210
2023-01-01
Abstract:With the increasing social needs, single-modal data have been unable to provide sufficient information for object detection. Reasonably processing multimodal information and fusing specific information of different modal data is one of the research hotspots in the field of data processing. To this end, this article proposes a feature square tower fusion module called FSFM, which is able to realize multimodal feature fusion by aggregating multilevel feature information and is used for object detection. First, the feature square tower strategy is put forward and embedded into the multimodal feature fusion framework. The multilevel features of the two modalities are fused by top-down feature aggregation. Second, by minimizing the foreground and background classification losses, a feature constraint module is constructed to constrain the infrared features to make them more salient. Third, a weighted feature fusion strategy is proposed based on second-order statistics (SOS) to guarantee strong discrimination of the fusion features in different scenarios. Finally, faster R-CNN is applied to detect the fused features. To illustrate the effectiveness of the method, object detection experiments are conducted on the multimodal datasets (MFD)-F-3 and multispectral. The results show that the proposed network can achieve better fusion detection effects.
What problem does this paper attempt to address?