Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts and Distribution-Driven Contrastive Regularization

Hongtao Wu,Yijun Yang,Angelica I Aviles-Rivero,Jingjing Ren,Sixiang Chen,Haoyu Chen,Lei Zhu
2024-10-10
Abstract:Snow degradations present formidable challenges to the advancement of computer vision tasks by the undesirable corruption in outdoor scenarios. While current deep learning-based desnowing approaches achieve success on synthetic benchmark datasets, they struggle to restore out-of-distribution real-world snowy videos due to the deficiency of paired real-world training data. To address this bottleneck, we devise a new paradigm for video desnowing in a semi-supervised spirit to involve unlabeled real data for the generalizable snow removal. Specifically, we construct a real-world dataset with 85 snowy videos, and then present a Semi-supervised Video Desnowing Network (SemiVDN) equipped by a novel Distribution-driven Contrastive Regularization. The elaborated contrastive regularization mitigates the distribution gap between the synthetic and real data, and consequently maintains the desired snow-invariant background details. Furthermore, based on the atmospheric scattering model, we introduce a Prior-guided Temporal Decoupling Experts module to decompose the physical components that make up a snowy video in a frame-correlated manner. We evaluate our SemiVDN on benchmark datasets and the collected real snowy data. The experimental results demonstrate the superiority of our approach against state-of-the-art image- and video-level desnowing methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the main problems encountered in the video snow removal task, that is, the existing deep - learning methods perform well on synthetic datasets but are not effective when processing snow - scene videos in the real world. Specifically, the paper aims to solve the following key problems: 1. **Distribution Difference Problem**: The distribution difference between synthetic data and real - world data leads to the unsatisfactory performance of existing snow removal methods in practical applications. Since the snow in the real world has unpredictable shapes and motion patterns, and synthetic data cannot fully simulate these characteristics, the model will encounter difficulties when processing real - world snow scenes. 2. **Lack of Paired Real - World Data**: It is very difficult to obtain a large amount of paired real - world snow - scene video data because factors such as weather conditions, object positions, and camera positions make it extremely complicated to align these videos. This limits the effectiveness of supervised learning methods. 3. **Improving Generalization Ability**: In order to enable the model to better adapt to various real - world scenarios, unlabeled real - world data needs to be introduced for semi - supervised learning to enhance the generalization ability of the model. To solve these problems, the authors propose a Semi - Supervised Video De - snowing Network (SemiVDN), which improves existing methods in the following ways: - **Constructing a Real - World Dataset**: 85 unpaired real - world snow - scene videos are collected for training. - **Introducing Distribution - Driven Contrastive Regularization**: The distribution difference between synthetic data and real data is reduced through the method of contrastive learning, thereby maintaining the consistency of background information. - **Prior - Guided Temporal Decoupling Expert Module**: Based on the atmospheric scattering model, the physical components in the video, such as the snow layer, transmission map, and atmospheric light, are explicitly decomposed to remove snow more effectively. Through these improvements, the SemiVDN proposed in this paper is significantly superior to the existing image - and video - level snow removal methods in terms of network performance and generalization ability. ### Formula Summary - **Atmospheric Scattering Model**: \[ I_{\text{snow}}(x)=J(x)T(x)+A(x)(1 - T(x))+S(x) \] where: - \(I_{\text{snow}}\) represents the video frame damaged by snow; - \(J\) represents the clean video frame; - \(T\) represents the transmission map; - \(A\) represents the atmospheric light; - \(S\) represents the snow layer. - **Prior - Guided Restoration Module Formula**: \[ F'_B=\frac{F'_I - F'_S-(1 - F'_T)F'_A}{F'_T+\beta} \] where: - \(F'_I\) is the encoded input feature; - \(F'_S\) is the snow feature; - \(F'_T\) is the transmission feature; - \(F'_A\) is the global atmospheric light feature; - \(\beta\) is a hyperparameter, set to \(10^{-8}\). - **Overall Optimization Objective**: \[ L_{\text{overall}}=L_{\text{sup}}+\mu L_{\text{un}} \] - **Distribution - Driven Contrastive Loss**: \[ L_{\text{DCR}}=L_{L1}(U^T_B + U^S_{\text{Snow}}, U^S_B+\hat{G}^S_{\text{Ultra}})-L_{L1}(U^T_B + U^S_{\text{Snow}}, G^S_B+\text{Aug}(U^T_{\text{Snow}}))+\epsilon \] where: - \(\epsilon\) is a hyperparameter, set to \(10^{-7}\); - \(L_{L1}(x, y)\) is the \(\ell_1\)-distance loss between \(x\) and \(y\). Through these methods, the paper...