Spectral-Spatial Transformer Network for Hyperspectral Image Classification: A Factorized Architecture Search Framework

Zilong Zhong,Ying Li,Lingfei Ma,Jonathan Li,Wei-Shi Zheng
DOI: https://doi.org/10.1109/tgrs.2021.3115699
IF: 8.2
2021-01-01
IEEE Transactions on Geoscience and Remote Sensing
Abstract:Neural networks have dominated the research of hyperspectral image classification, attributing to the feature learning capacity of convolution operations. However, the fixed geometric structure of convolution kernels hinders long-range interaction between features from distant locations. In this article, we propose a novel spectral-spatial transformer network (SSTN), which consists of spatial attention and spectral association modules, to overcome the constraints of convolution kernels. Also, we design a factorized architecture search (FAS) framework that involves two independent subprocedures to determine the layer-level operation choices and block-level orders of SSTN. Unlike conventional neural architecture search (NAS) that requires a bilevel optimization of both network parameters and architecture settings, the FAS focuses only on finding out optimal architecture settings to enable a stable and fast architecture search. Extensive experiments conducted on five popular HSI benchmarks demonstrate the versatility of SSTNs over other state-of-the-art (SOTA) methods and justify the FAS strategy. On the University of Houston dataset, SSTN obtains comparable overall accuracy to SOTA methods with a small fraction (1.2%) of multiply-and-accumulate operations compared to a strong baseline spectral-spatial residual network (SSRN). Most importantly, SSTNs outperform other SOTA networks using only 1.2% or fewer MACs of SSRNs on the Indian Pines, the Kennedy Space Center, the University of Pavia, and the Pavia Center datasets.
What problem does this paper attempt to address?