Pyramid Hierarchical Transformer for Hyperspectral Image Classification

Muhammad Ahmad,Muhammad Hassaan Farooq Butt,Manuel Mazzara,Salvatore Distifano
2024-04-23
Abstract:The traditional Transformer model encounters challenges with variable-length input sequences, particularly in Hyperspectral Image Classification (HSIC), leading to efficiency and scalability concerns. To overcome this, we propose a pyramid-based hierarchical transformer (PyFormer). This innovative approach organizes input data hierarchically into segments, each representing distinct abstraction levels, thereby enhancing processing efficiency for lengthy sequences. At each level, a dedicated transformer module is applied, effectively capturing both local and global context. Spatial and spectral information flow within the hierarchy facilitates communication and abstraction propagation. Integration of outputs from different levels culminates in the final input representation. Experimental results underscore the superiority of the proposed method over traditional approaches. Additionally, the incorporation of disjoint samples augments robustness and reliability, thereby highlighting the potential of our approach in advancing HSIC.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper aims to solve the efficiency and scalability problems encountered by traditional Transformer models when dealing with variable - length input sequences in hyperspectral image classification (HSIC). Specifically, the traditional Transformer model introduces quadratic complexity related to the sequence length due to its self - attention mechanism, which may lead to high computational costs when processing long sequences and is difficult to effectively capture spatial relation invariance. In addition, the traditional Transformer model requires a large amount of training data when dealing with large - scale data sets, otherwise it is prone to over - fitting, which limits its effectiveness in scenarios with limited labeled data. To solve these problems, the author proposes a hierarchical Transformer model (PyFormer) based on a pyramid structure. This model improves the processing efficiency of long sequences by hierarchically organizing the input data into multiple paragraphs, each representing a different level of abstraction. Applying specialized Transformer modules at each level can effectively capture local and global context information. Spatial and spectral information flows within the hierarchical structure, facilitating communication and abstraction propagation. The integration of outputs at different levels ultimately forms the input representation. Experimental results show that the proposed method is superior to traditional methods. In particular, after adding disjoint samples, the robustness and reliability of the model are enhanced, demonstrating its potential in advancing the field of HSIC.