From Optical to SAR: A SAR Ship Detection Algorithm Based on Multi-Level Cross-Modality Alignment
HE Jiayue,SU Nan,XU Cong'an,YIN Lu,LIAO Yanping,YAN Yiming
DOI: https://doi.org/10.11834/jrs.20243249
2024-01-01
Abstract:In recent years,interest in Synthetic Aperture Radar(SAR)ship detection has considerably grown.Its distinctive strengths position it as a pivotal player in numerous fields of research.However,the inherent characteristics of SAR images have presented a range of challenges.For instance,in contrast to optical images,SAR images have counterintuitive feature representation.Additionally,owing to the constrained number of SAR image data,achieving satisfactory results with existing methods that depend on a substantial number of annotated SAR images might be challenging. How to effectively train a high-performance SAR ship detection network with a limited quantity of SAR images should be investigated.Given that single-modality SAR detection algorithms have inherent limitations,other effective modalities that can assist the SAR modality in completing tasks are needed.For instance,in SAR image target detection,optical images can serve as supplementary data sources.A knowledge-rich model can be developed by utilizing a large volume of optical data in training with SAR data.Hence,reasonable training approaches for effectively utilizing images from SAR and optical modalities should be explored. To address these challenges,a SAR ship detection algorithm called MCMA-Net,which is based on multilevel cross-modality alignment,is proposed in this paper.The MCMA-Net enriches SAR feature representation by incorporating valuable knowledge from optical modality.First,we propose a neighborhood-global attention-based feature interaction network(NGAN),which employs a neighborhood attention mechanism that enables the local interaction of low-level features and a global self-attention mechanism that captures global context from high-level features.When the ability of global context modeling is considered,the encoding ability of local features improves,NGAN enables the network to focus on corresponding information at different levels and can promote the subsequent multilevel modality alignment.Second,we propose a multilevel modality alignment module(MLMA),which aligns features in the different hidden spaces of the two modalities from three levels.MLMA facilitates the model to acquire modality-invariant features,bridging the modality gap and realizing optical knowledge transmission.Valuable information from the optical modality can compensate for certain deficiencies in SAR images.With the aid of these two modules,we have incorporated optical superiority information by leveraging SAR's inherent advantages,achieving an enhancement in the performance of SAR detection tasks. Our algorithm is superior to current detection algorithms.Notably,whether on public SAR image datasets or our own SAR image dataset,the MCMA-Net consistently achieves optimal detection results,which indicates the model's stable performance and robustness.The visualization results indicate that the MCMA-Net achieves excellent detection capabilities in complex scenarios.The ablation experiments demonstrate that compared with the baseline model,our algorithm achieved a 2.7%increase in mAP on the SSDD dataset.Various experimental results have consistently validated the rationality of the MCMA-Net.