3D-RCNet: Learning from Transformer to Build a 3D Relational ConvNet for Hyperspectral Image Classification

Haizhao Jing,Liuwei Wan,Xizhe Xue,Haokui Zhang,Ying Li
2024-08-25
Abstract:Recently, the Vision Transformer (ViT) model has replaced the classical Convolutional Neural Network (ConvNet) in various computer vision tasks due to its superior performance. Even in hyperspectral image (HSI) classification field, ViT-based methods also show promising potential. Nevertheless, ViT encounters notable difficulties in processing HSI data. Its self-attention mechanism, which exhibits quadratic complexity, escalates computational costs. Additionally, ViT's substantial demand for training samples does not align with the practical constraints posed by the expensive labeling of HSI data. To overcome these challenges, we propose a 3D relational ConvNet named 3D-RCNet, which inherits both strengths of ConvNet and ViT, resulting in high performance in HSI classification. We embed the self-attention mechanism of Transformer into the convolutional operation of ConvNet to design 3D relational convolutional operation and use it to build the final 3D-RCNet. The proposed 3D-RCNet maintains the high computational efficiency of ConvNet while enjoying the flexibility of ViT. Additionally, the proposed 3D relational convolutional operation is a plug-and-play operation, which can be inserted into previous ConvNet-based HSI classification methods seamlessly. Empirical evaluations on three representative benchmark HSI datasets show that the proposed model outperforms previous ConvNet-based and ViT-based HSI approaches.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key challenges in hyperspectral image (HSI) classification, especially the difficulties encountered when combining convolutional neural networks (ConvNet) and visual transformers (ViT). Specifically, the paper attempts to solve the following problems: 1. **Computational complexity problem**: - The self - attention mechanism of ViT has quadratic complexity (\(O(n^2)\)), which makes it computationally expensive when processing hyperspectral image data. - Hyperspectral image data usually has a high resolution and a complex three - dimensional structure, so a method that can efficiently process three - dimensional data and reduce computational complexity is required. 2. **Training sample requirement problem**: - ViT requires a large number of training samples to achieve good performance, while the labeling cost of hyperspectral image data is high, and it is difficult to obtain a large amount of labeled data in practical applications. - Therefore, a model that can still effectively extract features and perform classification under the condition of limited samples is required. 3. **Limitations of a single structure**: - Although using 3D ConvNet alone can handle local features well, it performs poorly in capturing long - distance dependencies. - Although using ViT alone can capture global features well, its computational complexity and the requirement for training samples limit its wide application in hyperspectral image classification. To solve these problems, the paper proposes a new model - **3D Relational ConvNet (3D - RCNet)**, which embeds the self - attention mechanism of Transformer into the convolution operation and designs a 3D relational convolutional operation. This design inherits the efficiency of ConvNet and the flexibility of ViT, thus achieving better performance in the hyperspectral image classification task. ### Main contributions 1. **Proposing 3D Relational Convolutional Block (3D - RCBlock)**: - Embed the self - attention mechanism into the convolution operation to form a new HSI feature extraction operation, inheriting the advantages of ConvNet and ViT. 2. **Constructing a hybrid network framework**: - Based on the proposed 3D - RCBlock, construct a hybrid network framework and seamlessly integrate 3D - RCBlock into the classical 3D ConvNet. 3. **Conducting exhaustive ablation experiments**: - Analyze each module in detail and provide comprehensive guiding conclusions to help optimize the model structure. Through these improvements, 3D - RCNet shows excellent classification performance on three publicly representative hyperspectral image data sets, surpassing previous ConvNet and ViT methods.