Towards Interactive Facial Image Inpainting by Text or Exemplar Image.

Ailin Li,Lei Zhao,Zhiwen Zuo,Zhizhong Wang,Wei Xing,Dongming Lu
DOI: https://doi.org/10.1007/978-3-031-27077-2_11
2023-01-01
Abstract:Facial image inpainting aims to fill visually realistic and semantically new pixels for masked or missing pixels in a face image. Although current methods have made progress in achieving high visual quality, the controllable diversity of face inpainting remains an open issue. This paper proposes a new facial image inpainting interaction mode, which enables filling semantic contents based on the texts or exemplar images provided by users. We use the powerful image-text representation abilities from the recently introduced Contrastive Language-Image Pre-training (CLIP) models to achieve this interactive face inpainting. We present two thoughts on our method. Specifically, we first explore a simple and effective optimization-based text-guided facial inpainting method in which a CLIP model is used as a loss network to modify the latent code iteratively in response to the text prompt. Next, we describe a multi-modal inpainting mapper network to map the input conditions (e.g., text or image) into corresponding latent code changes, supporting the guidance of different text prompts and exemplars within one model. We also introduce an exemplar-semantic similarity loss, which maps the inpainted facial image and the exemplar image into the CLIP's space to measure their similarity. This loss enables the generated image to include high-level semantic attributes from the exemplar image. Through extensive experiments, we demonstrate the effectiveness of our method in interactive facial inpainting based on the texts or exemplars.
What problem does this paper attempt to address?