Explainable Multimodal Learning in Remote Sensing: Challenges and Future Directions

Alexander Günther,Hiba Najjar,Andreas Dengel
DOI: https://doi.org/10.1109/lgrs.2024.3404596
IF: 5.343
2024-06-19
IEEE Geoscience and Remote Sensing Letters
Abstract:Earth observation applications effectively leverage deep learning (DL) models to harness the abundantly available remote sensing (RS) data. In order to use all the different modalities relevant to a specific task, the fusion of these data sources can be achieved using multimodal learning techniques. This is especially helpful when the input dataset contains both images and tabular data or when the temporal and spatial resolutions vary across the modalities of interest. Nevertheless, these fusion techniques typically increase in complexity, as the disparities in the nature of the fused modalities increase. The resulting complex DL models suffer from a lack of explainability and transparency, which is crucial in many sensitive human-related applications. In this letter, we describe how the research community in RS addresses the issue of model explainability in the context of multimodal learning. We additionally review the practices used in other application fields and identify some of the most promising explainability methods tailored for multimodal deep networks to be exploited in RS applications.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?