Abstract:Accurate alignment is crucial for video denoising. However, estimating alignment in noisy environments is challenging. This paper introduces a cascading refinement video denoising method that can refine alignment and restore images simultaneously. Better alignment enables restoration of more detailed information in each frame. Furthermore, better image quality leads to better alignment. This method has achieved SOTA performance by a large margin on the CRVD dataset. Simultaneously, aiming to deal with multi-level noise, an uncertainty map was created after each iteration. Because of this, redundant computation on the easily restored videos was avoided. By applying this method, the entire computation was reduced by 25% on average.
What problem does this paper attempt to address?
This paper attempts to solve two key problems in video denoising:
1. **Accurate Alignment**:
- In video frames shot under low - light conditions, severe noise will inevitably occur, which not only affects the video quality but also affects subsequent video analysis tasks, such as visual odometry or tracking algorithms. These tasks are very sensitive to noise.
- Video denoising algorithms need to estimate the alignment between frames and use the redundant information in unaligned video frames for restoration. However, it is very challenging to estimate accurate alignment in a noisy environment. Incorrect alignment will lead to artifacts in the recovered frames, and the alignment estimation error is likely to propagate from one frame to another, increasing the adverse effects.
- Small or fast - moving objects, changing lighting conditions and occlusion will all seriously affect the alignment performance.
2. **Multi - level Noise Handling**:
- Another challenge in video denoising is how to handle multi - level noise without prior noise information. The noise level depends on many factors, such as ISO, luminance, temperature, etc., and is usually unknown in practical applications. Even in different regions of the same image, the noise level may be different.
- Existing deep - learning denoising methods either require a single noise level as input or rely on noise information as input. Some methods handle multi - level noise by adding additional network structures, but these modules are redundant for images with mild noise.
### Solutions Proposed in the Paper
To solve the above problems, this paper proposes a **Cascading Refinement Video Denoising with Uncertainty Adaptivity** method, which specifically includes the following three main parts:
1. **Pre - denoising and Patch Matching**:
- More accurate initial alignment estimates are obtained through pre - denoising, making subsequent iterative refinements easier.
- Normalized Cross - Correlation (NCC) is used for patch matching to find the corresponding patches for each specific patch in the support frames.
2. **Cascading Refinement**:
- An iterative refinement structure is adopted to estimate the alignment and restore the image simultaneously. After each iteration, the image quality and alignment accuracy will gradually improve.
- Flow - guided Deformable Convolution is used to provide offset diversity, thereby improving the denoising performance.
- The feature map of the reference frame is fused with the feature map of the aligned support frame as the input for the next iteration.
3. **Uncertainty Adaptivity**:
- An uncertainty map is generated after each iteration to estimate the error between the recovered frame and the true value.
- Whether to continue the iteration is determined according to the magnitude of the uncertainty. If the uncertainty is below a certain threshold, the current iteration result is directly output; otherwise, the refinement iteration is continued.
- This method can reduce the computational cost, with an average reduction of 25% in the amount of computation.
Through these innovations, this method has achieved significantly better performance than existing methods on the CRVD dataset and can flexibly adapt to videos with different noise levels.