From Patch to Pixel: A Transformer-Based Hierarchical Framework for Compressive Image Sensing.
Hongping Gan,Minghe Shen,Yi Hua,Chunyan Ma,Tao Zhang
DOI: https://doi.org/10.1109/tci.2023.3244396
IF: 5.4
2023-01-01
IEEE Transactions on Computational Imaging
Abstract:The convolutional neural network (CNN)-based reconstruction methods have dominated the compressive sensing (CS) in recent years. However, existing CNN-based approaches show potential restrictions in capturing non-local similarity of images, because of the intrinsic characteristic of convolutional layers, $\mathit{i.e.}$ , locality and weight sharing. In parallel, the emerging Transformer architecture shows fine capacity in modeling long-distance correlations onto embedded tokens for language and images. Yet vanilla Transformer does not exceed CNN-based networks considerably but shows roughly comparable performance, and the culprit can be the missing of sophisticated inductive bias regarding the local image structures. In this article, to eliminate the restrictions of the aforementioned paradigms, we propose a Transformer-based hierarchical framework, dubbed TCS-Net, for compressive image sensing (or image compressive sensing) with a $\mathit{patch-to-pixel}$ manner. Concretely, the proposed TCS-Net consists of an image acquisition module and a reconstruction module (includes two key decoding phases: a patch-wise decoding phase and a pixel-wise decoding phase). The acquisition module can implement data-driven image sampling by jointly learning with the decoding phases. By adjusting the Transformer architecture to the $\mathit{patch-to-pixel}$ multi-stage pattern, our reconstruction module can gradually decode the CS measurements from the patch-wise outlines to the pixel-wise textures, thereby building a high-precision mapping for image reconstruction. Extensive experiments on several datasets verify that the proposed TCS-Net outperforms existing state-of-the-art image CS methods by considerable margins.