Multimodal Online Knowledge Distillation Framework for Land Use/Cover Classification Using Full or Missing Modalities
Xiao Liu,Fei Jin,Shuxiang Wang,Jie Rui,Xibing Zuo,Xiaobing Yang,Chuanxiang Cheng
DOI: https://doi.org/10.1109/tgrs.2024.3388604
IF: 8.2
2024-04-27
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Multimodal land use/cover classification using optical and synthetic aperture radar (SAR) images has attracted significant attention because the unique radiation and geometric characteristics of these images provide complementary information regarding land properties. However, the significant differences between these modalities create a large semantic gap, posing challenges for effective feature fusion in multimodal learning. Moreover, missing modalities often occur in practical applications due to weather constraints or sensor malfunctions, posing challenges to achieving high performance in cross-modal learning. In this study, we proposed a multimodal online knowledge distillation (MMOKD) framework, designed for land use/cover classification of optical and SAR images using either full or missing modalities. This framework trains one modality-fusion network alongside two modality-specific networks in an end-to-end manner, facilitating both multimodal and cross-modal learning. More specifically, we developed a multimodal feature fusion (MFF) module for integrating heterogeneous features and a single-modal feature generation (SFG) module for encapsulating cross-modal complementary information. In addition, we proposed the joint distillation with multitype fusion knowledge (JD-MFK) method, guiding the modality-specific student networks to comprehensively learn the modality-fusion teacher network. Notably, we adopted an online distillation strategy for real-time feedback and synchronous updates of both modality-fusion and modality-specific networks. Finally, we conducted extensive experiments on two multimodal land use/classification datasets with advanced multimodal fusion, cross-modal distillation, and specific baseline networks for comparison. The results demonstrate the effectiveness of the proposed MMODD, which not only outperforms the other networks in both full- and missing-modality scenarios but also significantly improves model training efficiency.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics