MASSFormer: Memory-Augmented Spectral-Spatial Transformer for Hyperspectral Image Classification
Le Sun,Hang Zhang,Yuhui Zheng,Zebin Wu,Zhonglin Ye,Haixing Zhao
DOI: https://doi.org/10.1109/tgrs.2024.3392264
IF: 8.2
2024-05-10
IEEE Transactions on Geoscience and Remote Sensing
Abstract:In recent years, convolutional neural networks (CNNs) have achieved remarkable success in hyperspectral image (HSI) classification tasks, primarily due to their outstanding spatial feature extraction capabilities. However, CNNs struggle to capture the diagnostic spectral information inherent in HSI. In contrast, vision transformers (ViTs) exhibit formidable prowess in handling spectral sequence information and exceling at capturing long-range correlations between pixels and bands. Nevertheless, due to the information loss during propagation, some existing transformer-based classification methods struggle to form sufficient spectral-spatial information mixing. To mitigate these limitations, we propose a memory-augmented spectral-spatial transformer (MASSFormer) for HSI classification. Specifically, MASSFormer incorporates two efficacious modules: the memory tokenizer (MT) and the memory-augmented transformer encoder (MATE). The former serves to transform spectral-spatial features into memory tokens for storing prior knowledge. The latter aims to extend traditional multihead self-attention (MHSA) operations by incorporating these memory tokens, enabling ample information blending while alleviating the potential depth decay in the model and consequently improving the model's classification performance. Extensive experiments conducted on four benchmark datasets demonstrate that the proposed method outperforms state-of-the-art methods. The source code is available at https://github.com/hz63/ MASSFormer for the sake of reproducibility.
imaging science & photographic technology,remote sensing,engineering, electrical & electronic,geochemistry & geophysics