RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images

Benzhi Wang,Jingkai Zhou,Jingqi Bai,Yang Yang,Weihua Chen,Fan Wang,Zhen Lei
2024-09-06
Abstract:In recent years, diffusion models have revolutionized visual generation, outperforming traditional frameworks like Generative Adversarial Networks (GANs). However, generating images of humans with realistic semantic parts, such as hands and faces, remains a significant challenge due to their intricate structural complexity. To address this issue, we propose a novel post-processing solution named RealisHuman. The RealisHuman framework operates in two stages. First, it generates realistic human parts, such as hands or faces, using the original malformed parts as references, ensuring consistent details with the original image. Second, it seamlessly integrates the rectified human parts back into their corresponding positions by repainting the surrounding areas to ensure smooth and realistic blending. The RealisHuman framework significantly enhances the realism of human generation, as demonstrated by notable improvements in both qualitative and quantitative metrics. Code is available at <a class="link-external link-https" href="https://github.com/Wangbenzhi/RealisHuman" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the problem of inaccurate morphology of complex human body parts such as hands and faces in generated images, especially in human body images generated using diffusion models. Although diffusion models have made remarkable progress in visual generation, they still face challenges when generating realistic hand and face details. These parts have a high structural complexity and are prone to morphological errors or distortion. To solve this problem, the paper proposes a new post - processing solution named RealisHuman. The RealisHuman framework operates in two stages: 1. **Generate realistic human body parts**: First, use the original deformed parts as a reference to generate more realistic human body parts such as hands or faces, ensuring consistency with the original image, including details such as skin color and texture. 2. **Seamless fusion**: Second, by redrawing the surrounding area, seamlessly integrate the repaired human body parts into the corresponding positions in the original image to ensure a smooth and realistic fusion effect. Through the operations in these two stages, the RealisHuman framework significantly improves the realism of human body parts in generated images, which has been verified by both qualitative and quantitative indicators. In addition, the paper also details the technical details of the method, including how to prepare data, use the Part Detail Encoder to extract detailed information, and how to perform seamless fusion and other steps. The experimental results show that RealisHuman performs excellently in improving the realism of hands and faces in generated images, especially when dealing with detail restoration in small areas.