DBCTNet: Double Branch Convolution-Transformer Network for Hyperspectral Image Classification

Rui Xu,Xue-Mei Dong,Weijie Li,Jiangtao Peng,Weiwei Sun,Yi Xu
DOI: https://doi.org/10.1109/tgrs.2024.3368141
IF: 8.2
2024-03-02
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Currently, deep learning (DL) methods represented by convolutional neural networks (CNNs) or Transformers are of great interest in hyperspectral image (HSI) classification. And recent works show that hybrid models using CNN and Transformer modules are expected to achieve better performance than when they are used alone. However, these hybrid models applied to HSI classification consider the combination of 2-D CNN and Transformer, which makes the models have high computational complexity. And the information of multiple spectral dimensions different from ordinary RGB images has not been fully excavated. Based on this, we propose, a double branch Convolution-Transformer network (DBCTNet). Specifically, a MSpeFE module is used for multiscale spectral feature extraction at the early stage of the proposed network. Then, a ConvTE block is designed to improve the original Transformer encoder (TE), where a Conv spectral projection unit and a convolutional multihead self-attention (CMHSA) unit are proposed to extract spatial and global spectral features. A double branch module is further built based on 3-D CNN and ConvTE. This module can fully integrate spatial and local–global spectral features, while also having low computational complexity. Experiment results on four public datasets, Pavia University, Houston, WHU-Hi-LongKou, and HuangHeKou, show that DBCTNet achieves satisfactory performance with a small number of parameters and relatively excellent efficiency compared to nine other networks. The implement of DBCTNet will be available publicly at (https://github.com/xurui-joei/DBCTNet).
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the limitations of existing methods in hyperspectral image (HSI) classification. Specifically: 1. **High computational complexity**: Existing hybrid models combine 2D convolutional neural networks (CNNs) and Transformers in HSI classification, resulting in models with high computational complexity. 2. **Under - utilization of multi - dimensional spectral information**: Unlike ordinary RGB images, HSI contains information in multiple spectral dimensions, but this information has not been fully exploited yet. 3. **Joint extraction of local and global features**: Existing methods often cannot effectively handle local and global information simultaneously when extracting spatial and spectral features, especially lacking in computational efficiency. To address these problems, the author proposes a double - branch convolution - Transformer network (DBCTNet). The main contributions of DBCTNet are as follows: 1. **Multi - scale spectral feature extraction module (MSpeFE)**: This module performs multi - scale feature extraction in the spectral dimension through convolution kernels of different sizes, enriches the spectral signal, and improves the model performance. 2. **Improved Transformer encoder (ConvTE)**: By introducing convolution operations to replace linear layers, ConvTE can extract spatial and global spectral features with fewer parameters and floating - point operations (FLOPs). 3. **Double - branch module (DBCT)**: Based on 3D CNN and ConvTE, this module can continuously model local and global representations and simultaneously extract spatial and local - global spectral features. Through these innovations, the experimental results of DBCTNet on four public datasets (Pavia University, Houston, WHU - Hi - LongKou, and HuangHeKou) show that it achieves satisfactory performance with a relatively small number of parameters and has relatively excellent efficiency.