A pyramid transformer with cross-shaped windows for low-light image enhancement

Canlin Li,Pengcheng Gao,Shun Song,Jinhua Liu,Lihua Bi
DOI: https://doi.org/10.1007/s00500-023-08788-4
IF: 3.732
2023-06-27
Soft Computing
Abstract:Low-light image enhancement is a low-level vision task. Most of the existing methods are based on convolutional neural network(CNN). Transformer is a predominant deep learning model that has been widely adopted in various fields, such as natural language processing and computer vision. Compared with CNN, transformer has the ability to capture long-range dependencies to make full use of global contextual information. For low-light enhancement tasks, this capability can promote the model to learn the correct luminance, color and texture. We try to introduce transformer into the low-light image enhancement field. In this paper, we design a pyramid transformer with cross-shaped windows (CSwin-P). CSwin-P contains an encoder and decoder. Both the encoder and decoder contain several stages. Each stage contains several enhanced CSwin transformer blocks (ECTB). ECTB uses cross-shaped window self-attention and a feed-forward layer with spatial interaction unit. Spatial interaction unit can further capture local contextual information through gating mechanism. CSwin-P uses implicit positional encoding, and the model is unrestricted by the image size in the inference phase. Numerous experiments prove that our method is superior to the current state-of-the-art methods.
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?