PromptRR: Diffusion Models as Prompt Generators for Single Image Reflection Removal

Tao Wang,Wanglong Lu,Kaihao Zhang,Wenhan Luo,Tae-Kyun Kim,Tong Lu,Hongdong Li,Ming-Hsuan Yang
2024-02-04
Abstract:Existing single image reflection removal (SIRR) methods using deep learning tend to miss key low-frequency (LF) and high-frequency (HF) differences in images, affecting their effectiveness in removing reflections. To address this problem, this paper proposes a novel prompt-guided reflection removal (PromptRR) framework that uses frequency information as new visual prompts for better reflection performance. Specifically, the proposed framework decouples the reflection removal process into the prompt generation and subsequent prompt-guided restoration. For the prompt generation, we first propose a prompt pre-training strategy to train a frequency prompt encoder that encodes the ground-truth image into LF and HF prompts. Then, we adopt diffusion models (DMs) as prompt generators to generate the LF and HF prompts estimated by the pre-trained frequency prompt encoder. For the prompt-guided restoration, we integrate specially generated prompts into the PromptFormer network, employing a novel Transformer-based prompt block to effectively steer the model toward enhanced reflection removal. The results on commonly used benchmarks show that our method outperforms state-of-the-art approaches. The codes and models are available at
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the key challenges in **Single Image Reflection Removal (SIRR)**. Specifically, the existing deep - learning - based SIRR methods often overlook the **low - frequency (LF) and high - frequency (HF) differences** in the image when removing reflections, which affects their effectiveness. #### Main problems: 1. **Loss of low - and high - frequency information**: Existing methods fail to fully capture and utilize the low - and high - frequency differences in the image, resulting in poor reflection removal effects. 2. **Lack of effective visual cues**: Traditional SIRR methods do not fully utilize frequency information as visual cues to guide the model for more effective reflection removal. #### Solutions: To solve these problems, the author proposes a new framework - **PromptRR (Prompt - guided Reflection Removal)**. This framework improves SIRR in the following ways: - **Introduce frequency information as visual cues**: Utilize the low - and high - frequency information of the image as new visual cues to better guide the reflection removal process. - **Decouple the reflection removal process**: Divide the reflection removal process into two stages: 1. **Prompt Generation**: Use a pre - trained Frequency Prompt Encoder (FPE) to generate low - and high - frequency prompts, and further generate high - quality prompts through Diffusion Models (DMs). 2. **Prompt - guided Restoration**: Integrate the generated prompts into a specially designed PromptFormer network, and effectively guide the model for reflection removal through the Transformer - based prompt block. #### Formula representation: In the prompt generation process, the diffusion model simulates the degradation process of the prompt vector by gradually adding Gaussian noise: \[ P_{t + 1}=P_t+\epsilon \] where \( P_t \) is the prompt vector at step \( t \), and \( \epsilon \) is Gaussian noise. During the generation process, the denoising network predicts the noise and gradually removes it: \[ P_{t' + 1}=P_{t'}-\epsilon_\theta(P_c, P_{t'}, t') \] Finally, the joint training loss function is: \[ L = L_{\text{diff}}^l+L_{\text{diff}}^h+L_1 \] where \( L_{\text{diff}}^l \) and \( L_{\text{diff}}^h \) are the diffusion losses of the low - and high - frequency diffusion models respectively, and \( L_1 \) is the pixel - level loss of PromptFormer. Through these improvements, PromptRR shows better performance than existing methods on multiple publicly available real - world datasets.