Learning A Sparse Transformer Network for Effective Image Deraining

Xiang Chen,Hao Li,Mingqiang Li,Jinshan Pan
2023-03-21
Abstract:Transformers-based methods have achieved significant performance in image deraining as they can model the non-local information which is vital for high-quality image reconstruction. In this paper, we find that most existing Transformers usually use all similarities of the tokens from the query-key pairs for the feature aggregation. However, if the tokens from the query are different from those of the key, the self-attention values estimated from these tokens also involve in feature aggregation, which accordingly interferes with the clear image restoration. To overcome this problem, we propose an effective DeRaining network, Sparse Transformer (DRSformer) that can adaptively keep the most useful self-attention values for feature aggregation so that the aggregated features better facilitate high-quality image reconstruction. Specifically, we develop a learnable top-k selection operator to adaptively retain the most crucial attention scores from the keys for each query for better feature aggregation. Simultaneously, as the naive feed-forward network in Transformers does not model the multi-scale information that is important for latent clear image restoration, we develop an effective mixed-scale feed-forward network to generate better features for image deraining. To learn an enriched set of hybrid features, which combines local context from CNN operators, we equip our model with mixture of experts feature compensator to present a cooperation refinement deraining scheme. Extensive experimental results on the commonly used benchmarks demonstrate that the proposed method achieves favorable performance against state-of-the-art approaches. The source code and trained models are available at <a class="link-external link-https" href="https://github.com/cschenxiang/DRSformer" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is single image deraining, which is a typical low - level vision problem. Specifically, its goal is to recover a clear image from an observed image with rain streaks. Since both the clear image and the rain streaks are unknown, this constitutes an ill - posed inverse problem. Traditional methods usually impose various prior assumptions based on the statistical characteristics of rain streaks and clear images, but these hand - designed priors are not robust in complex rain scenes, limiting the deraining effect. To solve this problem, this paper proposes a new Transformer - based method - DRSformer (Sparse Transformer for Image Deraining), aiming to improve the quality of feature aggregation by adaptively retaining the most effective self - attention values, so as to better recover high - quality image details and textures. The following are the main contributions of this method: 1. **Proposing a sparse Transformer architecture**: This architecture can generate high - quality deraining results with more accurate detail and texture restoration. 2. **Developing a simple and effective learnable top - k selection operator**: It is used to adaptively maintain the most effective self - attention values for better feature aggregation. 3. **Designing an effective feed - forward network based on a mixed - scale fusion strategy**: It is used to explore multi - scale representations to better promote image deraining. 4. **Extensive experimental results show**: This method outperforms the existing state - of - the - art approaches on multiple benchmark datasets. By introducing the sparse attention mechanism and multi - scale information processing, DRSformer not only improves the ability of global feature modeling, but also enhances the expression of local features, thus achieving a better deraining effect.