Abstract:Editing faces in videos is a popular yet challenging task in computer vision and graphics that encompasses various applications, including facial attractiveness enhancement, makeup transfer, face replacement, and expression manipulation. Directly applying the existing warping methods to video face editing has the major problem of temporal incoherence in the synthesized videos, which cannot be addressed by simply employing face tracking techniques or manual interventions, as it is difficult to eliminate the subtly temporal incoherence of the facial feature point localizations in a video sequence. In this article, we propose a temporal-spatial-smooth warping (TSSW) method to achieve a high temporal coherence for video face editing. TSSW is based on two observations: (1) the control lattices are critical for generating warping surfaces and achieving the temporal coherence between consecutive video frames, and (2) the temporal coherence and spatial smoothness of the control lattices can be simultaneously and effectively preserved. Based upon these observations, we impose the temporal coherence constraint on the control lattices on two consecutive frames, as well as the spatial smoothness constraint on the control lattice on the current frame. TSSW calculates the control lattice (in either the horizontal or vertical direction) by updating the control lattice (in the corresponding direction) on its preceding frame, i.e., minimizing a novel energy function that unifies a data-driven term, a smoothness term, and feature point constraints. The contributions of this article are twofold: (1) we develop TSSW, which is robust to the subtly temporal incoherence of the facial feature point localizations and is effective to preserve the temporal coherence and spatial smoothness of the control lattices for editing faces in videos, and (2) we present a new unified video face editing framework that is capable for improving the performances of facial attractiveness enhancement, makeup transfer, face replacement, and expression manipulation.

Video Editing with Temporal, Spatial and Appearance Consistency

Attention-guided Temporally Coherent Video Object Matting

Robust Visual Tracking Via CAMShift and Structural Local Sparse Appearance Model

Task-agnostic Temporally Consistent Facial Video Editing

Spatio-Temporal Video Segmentation of Static Scenes and Its Applications

A Global Approach for Video Matching

Temporally Consistent Object Editing in Videos using Extended Attention

Discontinuity-aware Video Object Cutout

Refilming with Depth-Inferred Videos

Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

Interactive space-time video matting approach

Occlusion-aware Video Temporal Consistency

Video Face Editing Using Temporal-Spatial-Smooth Warping

Alignment-guided Temporal Attention for Video Action Recognition

Adaptive Background Matting Using Background Matching

VIA: Unified Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

Context-Aware Talking-Head Video Editing

ReVideo: Remake a Video with Motion and Content Control

Contour-assistance-based video matting localization

Adaptive Selection of Reference Frames for Video Object Segmentation.

Matching-Area-Based Seam Carving for Video Retargeting