Abstract:In recent years, diffusion models have revolutionized visual generation, outperforming traditional frameworks like Generative Adversarial Networks (GANs). However, generating images of humans with realistic semantic parts, such as hands and faces, remains a significant challenge due to their intricate structural complexity. To address this issue, we propose a novel post-processing solution named RealisHuman. The RealisHuman framework operates in two stages. First, it generates realistic human parts, such as hands or faces, using the original malformed parts as references, ensuring consistent details with the original image. Second, it seamlessly integrates the rectified human parts back into their corresponding positions by repainting the surrounding areas to ensure smooth and realistic blending. The RealisHuman framework significantly enhances the realism of human generation, as demonstrated by notable improvements in both qualitative and quantitative metrics. Code is available at <a class="link-external link-https" href="https://github.com/Wangbenzhi/RealisHuman" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

This paper attempts to solve the problem of inaccurate morphology of complex human body parts such as hands and faces in generated images, especially in human body images generated using diffusion models. Although diffusion models have made remarkable progress in visual generation, they still face challenges when generating realistic hand and face details. These parts have a high structural complexity and are prone to morphological errors or distortion. To solve this problem, the paper proposes a new post - processing solution named RealisHuman. The RealisHuman framework operates in two stages: 1. **Generate realistic human body parts**: First, use the original deformed parts as a reference to generate more realistic human body parts such as hands or faces, ensuring consistency with the original image, including details such as skin color and texture. 2. **Seamless fusion**: Second, by redrawing the surrounding area, seamlessly integrate the repaired human body parts into the corresponding positions in the original image to ensure a smooth and realistic fusion effect. Through the operations in these two stages, the RealisHuman framework significantly improves the realism of human body parts in generated images, which has been verified by both qualitative and quantitative indicators. In addition, the paper also details the technical details of the method, including how to prepare data, use the Part Detail Encoder to extract detailed information, and how to perform seamless fusion and other steps. The experimental results show that RealisHuman performs excellently in improving the realism of hands and faces in generated images, especially when dealing with detail restoration in small areas.

RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images

HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance

Realistic Face Reenactment Via Self-Supervised Disentangling of Identity and Pose

HandRefiner: Refining Malformed Hands in Generated Images by Diffusion-based Conditional Inpainting

Two Birds with One Stone: Transforming and Generating Facial Images with Iterative GAN

From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation

Two Birds with One Stone: Iteratively Learn Facial Attributes with GANs.

Diffusion-HPC: Generating Synthetic Images with Realistic Humans

HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion

Giving a Hand to Diffusion Models: a Two-Stage Approach to Improving Conditional Human Image Generation

3D-Aware Semantic-Guided Generative Model for Human Synthesis.

Photo-Realistic and Robust Inpainting of Faces Using Refinement GANs

PSHuman: Photorealistic Single-view Human Reconstruction using Cross-Scale Diffusion

Human 3Diffusion: Realistic Avatar Creation via Explicit 3D Consistent Diffusion Models

Digi2Real: Bridging the Realism Gap in Synthetic Data Face Recognition via Foundation Models

HumanGen: Generating Human Radiance Fields with Explicit Priors

AvatarMe++: Facial Shape and BRDF Inference With Photorealistic Rendering-Aware GANs

Semantic-Aware Human Object Interaction Image Generation

NeuralReshaper: Single-image Human-body Retouching with Deep Neural Networks

Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body

InceptionHuman: Controllable Prompt-to-NeRF for Photorealistic 3D Human Generation