Learning SAR-Optical Cross Modal Features for Land Cover Classification

Yujun Quan,Rongrong Zhang,Jian Li,Song Ji,Hengliang Guo,Anzhu Yu
DOI: https://doi.org/10.3390/rs16020431
IF: 5
2024-01-23
Remote Sensing
Abstract:Synthetic aperture radar (SAR) and optical images provide highly complementary ground information. The fusion of SAR and optical data can significantly enhance semantic segmentation inference results. However, the fusion methods for multimodal data remains a challenge for current research due to significant disparities in imaging mechanisms from diverse sources. Our goal was to bridge the significant gaps between optical and SAR images by developing a dual-input model that utilizes image-level fusion. To improve most existing state-of-the-art image fusion methods, which often assign equal weights to multiple modalities, we employed the principal component analysis (PCA) transform approach. Subsequently, we performed feature-level fusion on shallow feature maps, which retain rich geometric information. We also incorporated a channel attention module to highlight channels rich in features and suppress irrelevant information. This step is crucial due to the substantial similarity between SAR and optical images in shallow layers such as geometric features. In summary, we propose a generic multimodal fusion strategy that can be attached to most encoding–decoding structures for feature classification tasks, designed with two inputs. One input is the optical image, and the other is the three-band fusion data obtained by combining the PCA component of the optical image with the SAR. Our feature-level fusion method effectively integrates multimodal data. The efficiency of our approach was validated using various public datasets, and the results showed significant improvements when applied to several land cover classification models.
environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary
What problem does this paper attempt to address?
The paper aims to address the fusion problem of multimodal data (Synthetic Aperture Radar (SAR) images and optical images) in the Land Cover Classification (LCC) task. Specifically, the goal of the paper is to bridge the significant differences between optical images and SAR images by developing a dual-input model and to improve existing multimodal image fusion techniques using an image-level fusion method. Additionally, to further enhance the fusion effect, the paper introduces the Principal Component Analysis (PCA) transformation method for feature-level fusion and combines it with a channel attention module to highlight important feature information and suppress irrelevant information. This approach is particularly suitable for handling the rich geometric information in shallow feature maps, thereby improving the application effect of multimodal data in LCC tasks. The paper validates the effectiveness of the proposed method through multiple public datasets and demonstrates significant improvements in several land cover classification models. Overall, this study aims to overcome the limitations of single-modal data under different conditions by effectively fusing the advantages of SAR images and optical images, thereby enhancing the accuracy and robustness of land cover classification.