Abstract:Recently, convolutional neural networks (CNNs) have achieved remarkable success on single-image rain removal task. However, due to the intrinsic locality of convolution operations, CNN-based models generally demonstrate limitations in explicitly modeling long-range dependency. Transformer has achieved milestones in many artificial intelligence fields by mitigating the shortcomings of CNNs but can result in limited localization abilities and high computational cost. To this end, we propose a novel global–local transformer, termed GLFormer to model long-range dependencies for rain removal while remaining efficient. Specifically, we use a window-based local transformer block to build the shallow layers of GLFormer for processing high-resolution feature maps, which greatly reduces the computational complexity. And a global transformer block is designed to construct deep layers which can model long-range dependencies with global self-attention. Powered by these designs, GLFormer avoids the limitation of computing self-attention within a local window that lacks global feature inference and reduces the computational effort to a large extent. Considering that local details are crucial for the recovery of degraded images, we further employ convolution operation in both global and local transformer blocks to improve its potential for capturing local context. In addition, a self-supervised pre-training strategy is further introduced to mining sufficient image priors by utilizing ultra-large unlabeled image datasets. Our proposed method is extensively evaluated on several benchmark datasets, and the results show GLFormer to be superior than the state-of-the-art approaches built upon convolution.

Effective Local-Global Transformer for Natural Image Matting

Natural Image Matting with Attended Global Context

Highly Efficient Natural Image Matting

From Composited to Real-world: Transformer-based Natural Image Matting

A Late Fusion CNN for Digital Matting

TransMatting: Enhancing Transparent Objects Matting with Transformers

LGIT: local–global interaction transformer for low-light image denoising

Natural Image Matting via Guided Contextual Attention

LGFCTR: Local and Global Feature Convolutional Transformer for Image Matching

Lightweight Portrait Matting via Regional Attention and Refinement

Global–local transformer for single-image rain removal

Context-Aware Image Matting for Simultaneous Foreground and Alpha Estimation

Hierarchical local global transformer for point clouds analysis

Long-Range Feature Propagating for Natural Image Matting

MAT: Mask-Aware Transformer for Large Hole Image Inpainting

Deep image matting with cross-layer contextual information propagation

Morpho-Aware Global Attention for Image Matting

Local-to-Global Self-Attention in Vision Transformers

Improving Deep Image Matting via Local Smoothness Assumption

MatteFormer: Transformer-Based Image Matting via Prior-Tokens

Iterative Transductive Learning For Alpha Matting