A Dynamic Network with Transformer for Image Denoising

Mingjian Song,Wenbo Wang,Yue Zhao
DOI: https://doi.org/10.3390/electronics13091676
IF: 2.9
2024-04-27
Electronics
Abstract:Deep convolutional neural networks (CNNs) can achieve good performance in image denoising due to their superiority in the extraction of structural information. However, they may ignore the relationships between pixels to limit effects for image denoising. Transformer, focusing on pixel to pixel relationships can effectively solve this problem. This article aims to make a CNN and Transformer complement each other in image denoising. In this study, we propose a dynamic network with Transformer for image denoising (DTNet), with a residual block (RB), a multi-head self-attention block (MSAB), and a multidimensional dynamic enhancement block (MDEB). Firstly, the RB not only utilizes a CNN but also lays the foundation for the combination with Transformer. Then, the MSAB adds positional encoding and applies multi-head self-attention, which enables the preservation of sequential positional information while employing the Transformer to obtain global information. Finally, the MDEB uses dimension enhancement and dynamic convolution to improve the adaptive ability. The experiments show that our DTNet is superior to some existing methods for image denoising.
engineering, electrical & electronic,computer science, information systems,physics, applied
What problem does this paper attempt to address?
The paper proposes a new method for image denoising. Specifically, it points out the limitations of traditional Convolutional Neural Networks (CNNs) in image denoising—while CNNs can effectively extract structural information from images, they may overlook the inter-pixel relationships, which limits their performance in image denoising tasks. On the other hand, the Transformer architecture can better address this issue due to its focus on inter-pixel relationships. To solve the above problem, the authors propose a new model that combines the advantages of CNNs and Transformers—Dynamic Network with Transformer for Image Denoising (DTNet). DTNet includes three main components: 1. **Residual Block (RB)**: It not only utilizes CNNs for local feature extraction but also segments the image through residual learning operations, preparing it for subsequent input to the Transformer. 2. **Multi-Head Self-Attention Block (MSAB)**: It adds positional encoding and applies a multi-head self-attention mechanism to retain the sequence order relationship while capturing global features. 3. **Multidimensional Dynamic Enhancement Block (MDEB)**: It improves adaptability and computational efficiency through dimensional enhancement and dynamic convolution. By integrating the structural feature extraction capability of CNNs and the understanding of inter-pixel relationships by Transformers, DTNet aims to improve image denoising performance. Experimental results show that DTNet outperforms some existing methods in image denoising. Additionally, the paper provides a detailed introduction to the model design, loss function selection, and specific implementation of each module.