Abstract:Existing learning-based hyperspectral reconstruction methods show limitations in fully exploiting the information among the hyperspectral bands. As such, we propose to investigate the chromatic inter-dependencies in their respective hyperspectral embedding space. These embedded features can be fully exploited by querying the inter-channel correlations in a combinatorial manner, with the unique and complementary information efficiently fused into the final prediction. We found such independent modeling and combinatorial excavation mechanisms are extremely beneficial to uncover marginal spectral features, especially in the long wavelength bands. In addition, we have proposed a spatio-spectral attention block and a spectrum-fusion attention module, which greatly facilitates the excavation and fusion of information at both semantically long-range levels and fine-grained pixel levels across all dimensions. Extensive quantitative and qualitative experiments show that our method (dubbed CESST) achieves SOTA performance. Code for this project is at: <a class="link-external link-https" href="https://github.com/AlexYangxx/CESST" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that in hyperspectral image reconstruction, existing methods have limitations in making full use of the information between hyperspectral bands, especially performing poorly in the long - waveband range. Specifically, existing learning - based hyperspectral reconstruction methods directly and crudely combine and project the features of RGB channels into the high - dimensional spectral space in the early stage, which will lead to the sacrifice of some crucial local spectral features. Therefore, this paper proposes a new framework to efficiently extract and fuse information by combining and embedding cross - channel spatial - spectral cues, thereby improving the quality of hyperspectral image reconstruction. ### Main Contributions 1. **Novel Framework**: - A new hyperspectral image reconstruction framework (CESST) is proposed. This framework first fully excavates the spatial - spectral features of each channel in the projected high - dimensional embedding space and then performs cross - channel fusion. This channel - independent modeling process ensures that local spectral features are fully revealed and preserved. 2. **Spectral Fusion Attention Module (SFAM)**: - A new spectral fusion attention module (SFAM) is designed. It combines queries and explores cross - channel correlations through six parallel Transformer branches to comprehensively mine complementary information for comprehensive cross - channel fusion. 3. **Efficient Spatial - Spectral Attention Block (SSAB)**: - An efficient plug - and - play spatial - spectral attention block (SSAB) is designed. It can simultaneously extract semantic long - distance and fine - grained pixel - level spatial - spectral features in all dimensions while maintaining a linear relationship between computational complexity and spatial dimension. 4. **Experimental Verification**: - Through extensive quantitative and qualitative experiments, it is proved that the CESST framework significantly outperforms the existing state - of - the - art methods in performance while requiring fewer parameters. ### Method Overview 1. **Network Architecture**: - A multi - scale encoder - decoder architecture is proposed, which has three layers of similar structures, and each layer focuses on different scales (full - size, half - size, and quarter - size). At each scale, three encoder - decoder feature extraction blocks (FEB) are designed to independently learn the context features of each channel. - Each FEB contains two encoder blocks, a bottleneck block, and two decoder blocks, and each block contains a spatial - spectral attention block (SSAB). 2. **Spatial - Spectral Attention Block (SSAB)**: - SSAB consists of parallel spatial multi - head self - attention (Spatial - MSA) and spectral multi - head self - attention (Spectral - MSA), which calculate spatial and spectral multi - head self - attention respectively to enhance cross - dimensional interaction. - Spatial - MSA adopts a method that combines conventional windowed multi - head self - attention (WMSA) and shuffled windowed multi - head self - attention (Shuffle - WMSA) to establish long - distance cross - window interaction. - Spectral - MSA is inspired by existing methods and regards the spectral feature map as tokens to focus on more non - local spectral self - similarities. 3. **Spectral Fusion Attention Module (SFAM)**: - SFAM combines queries and explores cross - channel correlations through six parallel Transformer branches to comprehensively mine complementary information for comprehensive cross - channel fusion. - It includes two parts: channel learning and spectral fusion. The channel learning part extracts the correlations between every two learned hyperspectral representations through six branches and performs fusion; the spectral fusion part splices three representative hyperspectral features and then inputs them into the residual coordinate attention block (RCAB) to generate a fine - grained pixel - level HSI signal. 4. **Objective Function**: - The LMIX loss combined with SSIM loss and L1 loss, as well as the mean relative absolute error (MRAE), is used to impose supervisory consistency constraints at the pixel and feature levels. - The total loss function is \( L = L_{\text{MIX}}+\lambda_1 L_{\text{MRAE}} \), where \(\lambda_1\) is a hyperparameter that controls the relative importance of the two loss terms and is empirically set to 100. ### Experimental Results - **Quantitative Results**: - Experiments were carried out on the NTIRE2022 HSI dataset and the ICVL HSI dataset, and the results show that CESST

Hyperspectral Image Reconstruction via Combinatorial Embedding of Cross-Channel Spatio-Spectral Clues

Hyperspectral image recovery based on fusion of coded aperture snapshot spectral imaging and RGB images by guided filtering

Compressive Single-Pixel Hyperspectral Imaging Using RGB Sensors

Spatial-spectral Encoding and Dictionary Optimization in Compressive Single-Pixel Hyperspectral Imaging Based on Mutual Coherence Minimization

Deeply Learned Broadband Encoding Stochastic Hyperspectral Imaging.

Hyperspectral Compressive Snapshot Reconstruction via Coupled Low-Rank Subspace Representation and Self-Supervised Deep Network

Unsupervised Spatial-spectral Network Learning for Hyperspectral Compressive Snapshot Reconstruction

SPECAT: SPatial-spEctral Cumulative-Attention Transformer for High-Resolution Hyperspectral Image Reconstruction

Attention and transformer complementary fusion network for hyperspectral image spectral reconstruction

Coarse-to-Fine Sparse Transformer for Hyperspectral Image Reconstruction

Unseen Feature Extraction: Spatial Mapping Expansion With Spectral Compression Network for Hyperspectral Image Classification

Tensor-Based Sparse Representation for Hyperspectral Image Reconstruction Using RGB Inputs

Fast Hyperspectral Image Recovery via Non-iterative Fusion of Dual-Camera Compressive Hyperspectral Imaging

Semantic-embedded Unsupervised Spectral Reconstruction from Single RGB Images in the Wild

Spectral-Enhanced Sparse Transformer Network for Hyperspectral Super-Resolution Reconstruction

CESA-MCFormer: An Efficient Transformer Network for Hyperspectral Image Classification by Eliminating Redundant Information

Hyperspectral Image Reconstruction of SD-CASSI Based on Nonlocal Low-Rank Tensor Prior

Cross-Scope Spatial-Spectral Information Aggregation for Hyperspectral Image Super-Resolution

Spectral Super-Resolution via Model-Guided Cross-Fusion Network

Coarse-Fine Spectral-Aware Deformable Convolution For Hyperspectral Image Reconstruction