Transformer-Based Incomplete Multi-Modal Learning for Land Cover Classification

Guozheng Xu,Xue Jiang,Yue Zhou,Jia Fu,Zicheng Huang,Xingzhao Liu
DOI: https://doi.org/10.1109/igarss53475.2024.10641815
2024-01-01
Abstract:Land cover (LC) classification via remote sensing is crucial for ecosystem monitoring and urban planning but faces the challenge of inconsistent multimodal data availability. Current techniques often falter with incomplete modalities, resulting in reduced performance and adaptability. Addressing these issues, this study propose the Transformer-based Incomplete Multi-Modal Learning (TIMML) framework. TIMML incorporates a Bernoulli indicator module during training to facilitate adaptation to missing modalities. This module, in tandem with a fusion token, is instrumental in enabling the model to handle the random omission of modalities by selectively nullifying data streams and effectively aggregates information from the remaining available modalities. Moreover, TIMML integrates a modality-aware regularization module designed to enhance the stability of the feature extraction process, especially when perturbed by the Bernoulli indicator during training. Our comprehensive experiments demonstrate that TIMML not only proficiently manages the challenge of missing modalities but also outperforms existing methods in LC classification tasks, marking a significant advancement in the field.
What problem does this paper attempt to address?