Spatial-Spectral Aggregation Transformer with Diffusion Prior for Hyperspectral Image Super-Resolution

Mingyang Zhang,Xiangyu Wang,Shuang Wu,Zhaoyang Wang,Maoguo Gong,Yu Zhou,Fenlong Jiang,Yue Wu
DOI: https://doi.org/10.1109/tcsvt.2024.3508844
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Constrained by imaging systems, hyperspectral images (HSIs) always have a low spatial resolution. Deep learning-based HSI super-resolution methods have achieved impressive results through learning the nonlinear mapping between low-resolution (LR) and high-resolution (HR) images. However, most of them take the LR image or its upsampled version through bicubic interpolation as input, leading to low-quality features and limited details captured by the network. As a powerful generative model, diffusion model has the ability to learn both contextual semantics and textual details from distinct timesteps, enabling the effective exploration of spatial-spectral distributions in high-dimensional data. In this paper, we propose a novel method that extracts high-quality prior information from original images to assist in super-resolution through pretraining a diffusion model. Specifically, we first train a diffusion model using original HSI patches in a self-supervised manner and then obtain prior features from the pretrained denoising U-Net decoder. To efficiently incorporate the prior features into the super-resolution model, we propose an adaptive fusion module based on spatial and spectral attention mechanisms, which enhances features in both dimensions while preserving the original characteristics. Additionally, to leverage the complementarity of spatial and spectral information, we design a spatial-spectral aggregation Transformer module that incorporates an adaptive interaction module to facilitate information exchange across different dimensions, thereby enhancing the representation capability. Extensive experiments on three public hyperspectral datasets demonstrate that the proposed method achieves excellent super-resolution performance and outperforms the state-of-the-art methods in terms of quantitative quality and visual results.
What problem does this paper attempt to address?