Learning Model-Blind Temporal Denoisers without Ground Truths

Yanghao Li,Bichuan Guo,Jiangtao Wen,Zhen Xia,Shan Liu,Yuxing Han
DOI: https://doi.org/10.48550/arXiv.2007.03241
2021-03-31
Abstract:Denoisers trained with synthetic data often fail to cope with the diversity of unknown noises, giving way to methods that can adapt to existing noise without knowing its ground truth. Previous image-based method leads to noise overfitting if directly applied to video denoisers, and has inadequate temporal information management especially in terms of occlusion and lighting variation, which considerably hinders its denoising performance. In this paper, we propose a general framework for video denoising networks that successfully addresses these challenges. A novel twin sampler assembles training data by decoupling inputs from targets without altering semantics, which not only effectively solves the noise overfitting problem, but also generates better occlusion masks efficiently by checking optical flow consistency. An online denoising scheme and a warping loss regularizer are employed for better temporal alignment. Lighting variation is quantified based on the local similarity of aligned frames. Our method consistently outperforms the prior art by 0.6-3.2dB PSNR on multiple noises, datasets and network architectures. State-of-the-art results on reducing model-blind video noises are achieved. Extensive ablation studies are conducted to demonstrate the significance of each technical components.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the problem of video denoising in the absence of an explicit noise model. Specifically, the paper focuses on how to train a video denoiser that can adapt to various unknown noises without using ground truth data. Traditional methods usually assume a specific noise model (such as Additive White Gaussian Noise, AWGN), or need to synthesize real - noise data to train the model, and these methods perform poorly when dealing with complex noises in actual videos. In addition, when directly extending image denoising methods to video denoising, problems such as noise over - fitting and insufficient management of temporal information will be encountered, especially when dealing with occlusion and illumination changes. To overcome these challenges, the paper proposes a general - purpose video denoising framework. The main contributions include: 1. **Improvement in Temporal Alignment**: Through an online denoising scheme and a new warping - loss regularization method, the content - aware ability of the optical - flow estimation network is improved, thereby enhancing the temporal alignment effect. 2. **Enhancement of Correspondence Management**: Combine two components. One is to generate a more accurate occlusion mask based on optical - flow consistency, and the other is to quantify illumination changes according to the local similarity of aligned frames. 3. **Twin Sampler**: Reveal the noise over - fitting problem when directly extending image methods to video denoising, and propose a novel twin sampler, which not only decouples the input and the target to prevent noise over - fitting, but also provides better occlusion inference as a by - product. Through these innovations, the method proposed in the paper performs well on multiple noise types, datasets, and network architectures, with the PSNR index increased by 0.6 - 3.2 dB, reaching the current state - of - the - art level.