Bridging Sensor Gaps via Attention Gated Tuning for Hyperspectral Image Classification

Xizhe Xue,Haokui Zhang,Zongwen Bai,Ying Li
2024-07-25
Abstract:Data-hungry HSI classification methods require high-quality labeled HSIs, which are often costly to obtain. This characteristic limits the performance potential of data-driven methods when dealing with limited annotated samples. Bridging the domain gap between data acquired from different sensors allows us to utilize abundant labeled data across sensors to break this bottleneck. In this paper, we propose a novel Attention-Gated Tuning (AGT) strategy and a triplet-structured transformer model, Tri-Former, to address this issue. The AGT strategy serves as a bridge, allowing us to leverage existing labeled HSI datasets, even RGB datasets to enhance the performance on new HSI datasets with limited samples. Instead of inserting additional parameters inside the basic model, we train a lightweight auxiliary branch that takes intermediate features as input from the basic model and makes predictions. The proposed AGT resolves conflicts between heterogeneous and even cross-modal data by suppressing the disturbing information and enhances the useful information through a soft gate. Additionally, we introduce Tri-Former, a triplet-structured transformer with a spectral-spatial separation design that enhances parameter utilization and computational efficiency, enabling easier and flexible fine-tuning. Comparison experiments conducted on three representative HSI datasets captured by different sensors demonstrate the proposed Tri-Former achieves better performance compared to several state-of-the-art methods. Homologous, heterologous and cross-modal tuning experiments verified the effectiveness of the proposed AGT. Code has been released at: \href{<a class="link-external link-https" href="https://github.com/Cecilia-xue/AGT" rel="external noopener nofollow">this https URL</a>}{<a class="link-external link-https" href="https://github.com/Cecilia-xue/AGT" rel="external noopener nofollow">this https URL</a>}.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is **the problem of cross - sensor data domain gap in hyperspectral image (HSI) classification**. Specifically, the paper aims to overcome the following challenges: 1. **Scarcity of labeled data**: Hyperspectral image classification methods usually require high - quality labeled data, but obtaining such data is both time - consuming and expensive. This limits the performance potential of data - driven methods when dealing with a limited number of labeled samples. 2. **Differences in cross - sensor and cross - modal data**: There are significant structural and feature differences between data obtained by different sensors, resulting in poor performance of transfer learning when directly using data from other sensors or modalities. To solve these problems, the authors propose two main innovations: ### 1. Attention - Gated Tuning (AGT) Strategy The AGT strategy aims to reconcile the conflicts between heterogeneous and cross - modal data by introducing a lightweight auxiliary branch. Specifically: - **Suppress interfering information**: Suppress irrelevant noise information through a soft gate mechanism. - **Enhance useful information**: Enhance the semantic information from the base model, thereby effectively improving the model performance. - **Utilize multi - source data**: It can not only utilize existing hyperspectral datasets, but also RGB datasets to enhance the performance on new HSI datasets. ### 2. Tri - Former Model Tri - Former is a triple - structure - based Transformer model with the following characteristics: - **Spectral - spatial separation design**: Improve parameter utilization and computational efficiency by separating spectral and spatial information processing. - **3D convolution enhancement**: Add 3D convolution layers to the model to strengthen the structural information and stabilize the training process. - **Flexible fine - tuning ability**: Make the model easier and more flexible to fine - tune, especially suitable for the case of a limited number of labeled samples. ### Summary The main contributions of the paper include: 1. Proposing a new Attention - Gated Tuning (AGT) strategy to solve the conflicts in cross - sensor and cross - modal data. 2. Designing a triple - structure hyperspectral image classification Transformer model (Tri - Former), whose flexible architecture can efficiently learn features from a limited number of training samples. 3. Establishing a connection between RGB and HSI datasets, allowing the use of rich RGB - labeled data to enhance HSI classification performance, especially when the labeled HSI data is limited. Through these innovations, the paper demonstrates the superior performance of its method on multiple representative hyperspectral image datasets and verifies the effectiveness of the AGT strategy.