Deep Interactive Video Inpainting: an Invisibility Cloak for Harry Potter.
Cheng Chen,Jiayin Cai,Yao Hu,Xu Tang,Xinggang Wang,Chun Yuan,Xiang Bai,Song Bai
DOI: https://doi.org/10.1145/3474085.3475262
2021-01-01
Abstract:In this paper, we propose a new task of deep interactive video inpainting and an application for users to interact with machines. To our best knowledge, this is the first deep learning-based interactive video inpainting framework that only uses a free form of user input as guidance (i.e. scribbles) instead of mask annotations, which has academic, entertainment, and commercial value. With users' scribbles on a certain frame, it simultaneously performs interactive video object segmentation and video inpainting throughout the whole video. To achieve this, we utilize a shared spatial-temporal memory module, which combines both segmentation and inpainting into an end-to-end pipeline. In our framework, the past frames with object masks (either the users' scribbles or the predicted masks) constitute an external memory, and the current frame as the query is segmented and inpainted by reading the visual cues stored in that memory. Furthermore, our method allows users to iteratively refine the segmentation results, which effectively improves the inpainting performance with frames where inferior segmentation results are witnessed. Hence, one could obtain high-quality video inpainting results even with challenging video sequences. Qualitative and quantitative experimental results demonstrate the superiority of our approach.
What problem does this paper attempt to address?