Video Inpainting Based on Residual Convolution Attention Network

Li De-cai,Yan Qun,Yao Jian-min,Lin Zhi-xian,Dong Ze-yu
DOI: https://doi.org/10.37188/cjlcd.2021-0196
2022-01-01
Abstract:Video inpainting, which aims at filling in missing regions of a video, remains challenging due to the difficulty of preserving the precise spatial and temporal coherence of video contents. In order to solve the problems of discontinuous semantic information, video blurriness and temporal artifact, and more and more complex network design, the overall speed of the network becoming slow, this paper proposes a residual convolution attention network (RCAN) for video inpainting. By introducing the self-attention mechanism and the global attention mechanism into the residual network, the ability of the network to learn the spatio-temporal features of all input frames is enhanced. This method proposes a spatial-temporal adversarial loss function to optimize RCAN, which improves the quality of video inpainting. At the same time, the network can define the number of layers and parameters with a high degree of freedom to improve the practical application ability of the network. Experimental results show that the network can achieve an average inpainting result in that the PSNR is 30. 68 dB, the SSIM is 0.961, and the FID is 0.113 on DAVIS and YouTube-VOS data sets. This method meets the inpainting quality requirements of the actual scene on the model and provides a new idea for video inpainting.
What problem does this paper attempt to address?