Learning to Generate 3D-Aware Realistic Hand from 2D and 3D Priors

Kai Lu,Haoran Zhang,Dejun Zhu,Wankou Yang
DOI: https://doi.org/10.23919/ccc63176.2024.10662843
2024-01-01
Abstract:The growing research domain of unsupervised learning methodologies for 3D-aware Generative Adversarial Networks (GANs) is particularly focused on utilizing extensive datasets containing unstructured single-view images. Recent advancements in 3D GANs have demonstrated their ability to enhance photorealistic synthesis and achieve multi-view consistency when generating radiance fields depicting human facial and body features. However, these methods have not yet effectively addressed human hands, primarily due to the increased complexity in learning the distribution of hand poses caused by diverse hand gestures and extensive self-occlusion. In this paper, our method represents a significant advancement in the domain of photorealistic 3D-aware image synthesis for articulated human hands. The key contribution of our model lies in the synthesis of high-quality 3D hand avatars, which incorporate intricate geometric details and capture detailed human hand features, such as skin wrinkles, in a more natural manner compared to previous approaches. We present a novel framework for effectively representing articulated human hands which integrates a heatmap-based feature generator, the innovative tri-plane feature volume representation, and a feature volume deformation method guided by a mesh. Our empirical findings demonstrate the superior performance of our methodology compared to preceding 3D and articulation-aware approaches in the generation of complex human hand representations. We validate the effectiveness of our model and the importance of each component via systematic ablation studies and demonstrate state-of-the-art 3D-aware synthesis with InterHand 2.6 M, achieving an FID of 7.53, which represents a $33 \%$ improvement compared to SOTA.
What problem does this paper attempt to address?