An Inpainting-Infused Pipeline for Attire and Background Replacement

Felipe Rodrigues Perche-Mahlow,André Felipe-Zanella,William Alberto Cruz-Castañeda,Marcellus Amadeus
2024-02-06
Abstract:In recent years, groundbreaking advancements in Generative Artificial Intelligence (GenAI) have triggered a transformative paradigm shift, significantly influencing various domains. In this work, we specifically explore an integrated approach, leveraging advanced techniques in GenAI and computer vision emphasizing image manipulation. The methodology unfolds through several stages, including depth estimation, the creation of inpaint masks based on depth information, the generation and replacement of backgrounds utilizing Stable Diffusion in conjunction with Latent Consistency Models (LCMs), and the subsequent replacement of clothes and application of aesthetic changes through an inpainting pipeline. Experiments conducted in this study underscore the methodology's efficacy, highlighting its potential to produce visually captivating content. The convergence of these advanced techniques allows users to input photographs of individuals and manipulate them to modify clothing and background based on specific prompts without manually input inpainting masks, effectively placing the subjects within the vast landscape of creative imagination.
Computer Vision and Pattern Recognition,Artificial Intelligence,Computation and Language
What problem does this paper attempt to address?
The paper proposes an integrated approach for clothing and background replacement, primarily addressing the issues of changing a person's clothing and background in images. This method leverages generative artificial intelligence (GenAI) and computer vision technologies, particularly depth estimation, inpainting techniques, and background generation in image processing. Specifically, the study adopts the following steps: 1. **Depth Estimation and Inpainting Mask Creation**: First, the MiDaS algorithm is used for depth estimation, and an inpainting mask is created based on the depth information. Then, threshold segmentation is used to determine which parts need to be retained or modified, and facial recognition technology is combined to ensure that facial features are not altered. 2. **Background Generation and Replacement**: Stable Diffusion and Latent Consistency Models (LCMs) are used to generate new background images and replace the original background. 3. **Clothing Generation**: The inpainting model of Stable Diffusion XL is utilized to generate new clothing styles based on prompts while retaining specific areas. The experimental results demonstrate the applicability and flexibility of this method in different scenarios, effectively generating images that match specific backgrounds and clothing styles. Additionally, the paper discusses some challenges, such as the potential inaccuracy in generating hand, foot, and arm positions in certain cases. In summary, this work provides an innovative solution that allows users to easily modify the clothing and background of people in photos without manually creating complex inpainting masks, thereby greatly expanding the possibilities for creative applications.