Video Harmonization with Triplet Spatio-Temporal Variation Patterns

Zonghui Guo,XinYu Han,Jie Zhang,Shiguang Shan,Haiyong Zheng
DOI: https://doi.org/10.1109/cvpr52733.2024.01814
2024-01-01
Abstract:Video harmonization is an important and challenging task that aims to obtain visually realistic composite videos by automatically adjusting the foreground's appearance to harmonize with the background. Inspired by the short-term and long-term gradual adjustment process of manual har-monization, we present a Video Triplet Transformer frame-work to model three spatio-temporal variation patterns within videos, i.e., short-term spatial as well as long-term global and dynamic, for video-to-video tasks like video har-monization. Specifically, for short-term harmonization, we adjust foreground appearance to consist with background in spatial dimension based on the neighbor frames; for long-term harmonization, we not only explore global ap-pearance variations to enhance temporal consistency but also alleviate motion offset constraints to align similar con-textual appearances dynamically. Extensive experiments and ablation studies demonstrate the effectiveness of our method, achieving state-of-the-art performance in video harmonization, video enhancement, and video demoireing tasks. We also propose a temporal consistency metric to better evaluate the harmonized videos. Code is available at https://github.com/zhenglablVideoTripletTransformer.
What problem does this paper attempt to address?