Dual-Scale Transformer for Large-Scale Single-Pixel Imaging

Gang Qu,Ping Wang,Xin Yuan

2024-04-07

Abstract:Single-pixel imaging (SPI) is a potential computational imaging technique which produces image by solving an illposed reconstruction problem from few measurements captured by a single-pixel detector. Deep learning has achieved impressive success on SPI reconstruction. However, previous poor reconstruction performance and impractical imaging model limit its real-world applications. In this paper, we propose a deep unfolding network with hybrid-attention Transformer on Kronecker SPI model, dubbed HATNet, to improve the imaging quality of real SPI cameras. Specifically, we unfold the computation graph of the iterative shrinkagethresholding algorithm (ISTA) into two alternative modules: efficient tensor gradient descent and hybrid-attention multiscale denoising. By virtue of Kronecker SPI, the gradient descent module can avoid high computational overheads rooted in previous gradient descent modules based on vectorized SPI. The denoising module is an encoder-decoder architecture powered by dual-scale spatial attention for high- and low-frequency aggregation and channel attention for global information recalibration. Moreover, we build a SPI prototype to verify the effectiveness of the proposed method. Extensive experiments on synthetic and real data demonstrate that our method achieves the state-of-the-art performance. The source code and pre-trained models are available at

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

This paper focuses on the issue of Single-Pixel Imaging (SPI) technology, which is a computational imaging technique that reconstructs images from a small number of measurements using a single-pixel detector. Current methods have limitations in reconstruction quality and practical applications because they are based on vectorized SPI models, resulting in high computational costs. The paper proposes a deep unfolded network called HATNet, which combines Hybrid Attention Transformer and Kronecker SPI models. HATNet consists of two alternating modules: an efficient tensor gradient descent module and a multi-scale denoising module based on hybrid attention. By utilizing Kronecker SPI, the high computational burden in the gradient descent module of previous vector-based SPI can be avoided. The denoising module adopts an encoder-decoder structure, combining dual-scale spatial attention and channel attention to achieve aggregation of high-frequency and low-frequency information as well as global information recalibration. Experiments demonstrate that HATNet achieves state-of-the-art performance on both synthetic and real data. Additionally, the authors constructed a prototype SPI system to validate the effectiveness of the proposed method. This work aims to bridge the gap between SPI systems and optimization algorithms for Compressive Sensing (CS), and improve the practical imaging quality of large-scale SPI.

Dual-Scale Transformer for Large-Scale Single-Pixel Imaging

Local-enhanced transformer for single-pixel imaging

High fidelity single-pixel imaging

Adaptive Super-Resolution Networks for Single-Pixel Imaging at Ultra-Low Sampling Rates

Enhancing single-pixel imaging reconstruction using hybrid transformer network with adaptive feature refinement

Hybrid CNN-Transformer Architecture for Efficient Large-Scale Video Snapshot Compressive Imaging

Scalable High-Resolution Single-Pixel Imaging via Pattern Reshaping

High-resolution single-photon imaging with physics-informed deep learning

PSCAT: a lightweight transformer for simultaneous denoising and super-resolution of OCT images

Hierarchical Separable Video Transformer for Snapshot Compressive Imaging

DS2TA: Denoising Spiking Transformer with Attenuated Spatiotemporal Attention

GAN-SRSPI: Super-Resolution Single-Pixel Imaging Using Generative Adversarial Networks

Single-Pixel Image Reconstruction Based on Block Compressive Sensing and Deep Learning

Single pixel imaging via unsupervised deep compressive sensing with collaborative sparsity in discretized feature space

Spectral Compressive Imaging Reconstruction Using Convolution and Contextual Transformer

Transformer-based Dual-domain Network for Few-view Dedicated Cardiac SPECT Image Reconstructions

Cross-Spatial Pixel Integration and Cross-Stage Feature Fusion Based Transformer Network for Remote Sensing Image Super-Resolution

Single-Pixel Imaging Based on Deep Learning Enhanced Singular Value Decomposition

Unfolding Framework with Prior of Convolution-Transformer Mixture and Uncertainty Estimation for Video Snapshot Compressive Imaging

Deep-Learning-Based Few-Angle Cardiac SPECT Reconstruction Using Transformer

Pixel-Based Long-Wave Infrared Spectral Image Reconstruction Using a Hierarchical Spectral Transformer