HyperSINet: A Synergetic Interaction Network Combined With Convolution and Transformer for Hyperspectral Image Classification
Qixing Yu,Weibo Wei,Dantong Li,Zhenkuan Pan,Chenyu Li,Danfeng Hong
DOI: https://doi.org/10.1109/tgrs.2024.3362471
IF: 8.2
2024-02-16
IEEE Transactions on Geoscience and Remote Sensing
Abstract:In hyperspectral images (HSIs), both local and nonlocal features play crucial roles in classification tasks. Vision transformers (VITs) can extract nonlocal features through attention mechanisms, while convolutional neural networks (CNNs) excel at handling local components. However, in traditional dual-branch models based on VIT and CNN, there is a lack of interaction during feature processing, leading to potential compatibility issues when merging the two types of features. In this article, we propose HyperSINet, a synergetic interaction network that combines VIT and CNN to establish interaction between the two branches, enabling mutual compensation between local and nonlocal features during the training process and ultimately enhancing the performance of classification tasks. Specifically, we devise a pair of interactors, namely, Conv2Trans and Trans2Conv, which serve as intermediaries between the two branches, enabling the VIT branch to refine its local details, while allowing the CNN branch to process larger receptive field nonlocal features. Typical feature maps are implemented to visualize the function of the interactors. Furthermore, within the VIT branch, a VIT encoder with the local mask is developed to strike a balance between emphasizing nonlocal features and preserving local details, while a lightweight CNN block is designed to process spectral and spatial features in the CNN branch. Extensive experiments conducted on four real-world datasets demonstrate that, under a reasonable count of parameters, HyperSINet surpasses several current state-of-the-art methods.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics