Learning rich multimodal representation for robust land cover classification in fog

Weipeng Shi,Wenhu Qin,Zhonghua Yun
DOI: https://doi.org/10.1109/jsen.2024.3364150
IF: 4.3
2024-01-01
IEEE Sensors Journal
Abstract:When drones perform remote sensing tasks, such as land cover classification in foggy conditions, sensor failures are frequently encountered, making it essential to utilize multimodal data. Heterogeneous feature spaces can be leveraged to obtain a comprehensive representation of specific locales, increasing the discriminability of terrestrial covers. However, existing multimodal fusion methodologies fail to fully capitalize on the unique attributes of individual modalities. Additionally, many approaches fall short in robust feature extraction and fusion from multimodal inputs, leaving room for improvement in classification accuracy. We introduce a cascading backbone network designed to adaptively fuse and mine the spatial contributions of multimodal data inputs. Signal-to-Noise Ratio (SNR) is used to guide attention towards salient spatial features within individual modalities. The proposed Context Representation Enhancement Module (CREM) enables the classification of spatial regions with more discriminative attributes, while mitigating the impact of irrelevant noise in digital surface models. Additionally, we engineer a Cross-Modal Fusion Module (CMFM) to recalibrate and refine multimodal feature tensors, exploring cross-modal complementarity. Experimental validation on three distinct datasets corroborates the robustness for our framework applied in land cover classification, substantiating its efficacy in semantic segmentation tasks in fog-corrupted environments.
engineering, electrical & electronic,instruments & instrumentation,physics, applied
What problem does this paper attempt to address?