LIVEnet: Linguistic-Interact-With-Visual Engager Domain Generalization for Cross-Scene Hyperspectral Imagery Classification

Yuanyuan Dang,Xianhe Zhang,Yongcheng Wang,Bing Liu
DOI: https://doi.org/10.1109/lgrs.2024.3407976
IF: 5.343
2024-06-21
IEEE Geoscience and Remote Sensing Letters
Abstract:Domain generalization (DG) has led to remarkable achievements in cross-scene hyperspectral image (HSI) classification. Inspired by contrastive language-image pretraining (CLIP), the language-aware DG method has been explored for cross-scene HSI classification with language prior knowledge. However, existing methods face some challenges: 1) the weak capacity to extract long-range contextual information and interclass correlation and 2) due to the inadequacies of the special pretraining on HSI data, the spatial-spectral features of HSI and linguistic features cannot be straightforwardly aligned. To tackle those dilemmas, a novel network has been proposed with a CLIP framework, which consists of an image encoder, based on an encoder-only transformer to obtain the global contextual information and interclass correlation, a frozen text encoder, and a cross-attention mechanism, named linguistic-interact-with-visual engager (LIVE), enhances the interaction between two modalities. Extensive experiments demonstrate superior performance over state-of-the-art (SOTA) methods with a CLIP framework, with 83.39% and 83.94% in OA, on the UH dataset and Pavia dataset, respectively.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?