CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

Hao Ouyang,Qiuyu Wang,Yuxi Xiao,Qingyan Bai,Juntao Zhang,Kecheng Zheng,Xiaowei Zhou,Qifeng Chen,Yujun Shen
DOI: https://doi.org/10.48550/arXiv.2308.07926
2023-08-16
Abstract:We present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field aggregating the static contents in the entire video and a temporal deformation field recording the transformations from the canonical image (i.e., rendered from the canonical content field) to each individual frame along the time <a class="link-external link-http" href="http://axis.Given" rel="external noopener nofollow">this http URL</a> a target video, these two fields are jointly optimized to reconstruct it through a carefully tailored rendering <a class="link-external link-http" href="http://pipeline.We" rel="external noopener nofollow">this http URL</a> advisedly introduce some regularizations into the optimization process, urging the canonical content field to inherit semantics (e.g., the object shape) from the <a class="link-external link-http" href="http://video.With" rel="external noopener nofollow">this http URL</a> such a design, CoDeF naturally supports lifting image algorithms for video processing, in the sense that one can apply an image algorithm to the canonical image and effortlessly propagate the outcomes to the entire video with the aid of the temporal deformation <a class="link-external link-http" href="http://field.We" rel="external noopener nofollow">this http URL</a> experimentally show that CoDeF is able to lift image-to-image translation to video-to-video translation and lift keypoint detection to keypoint tracking without any <a class="link-external link-http" href="http://training.More" rel="external noopener nofollow">this http URL</a> importantly, thanks to our lifting strategy that deploys the algorithms on only one image, we achieve superior cross-frame consistency in processed videos compared to existing video-to-video translation approaches, and even manage to track non-rigid objects like water and <a class="link-external link-http" href="http://smog.Project" rel="external noopener nofollow">this http URL</a> page can be found at <a class="link-external link-https" href="https://qiuyu96.github.io/CoDeF/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?