StyleGAN Salon: Multi-View Latent Optimization for Pose-Invariant Hairstyle Transfer

Sasikarn Khwanmuang,Pakkapon Phongthawee,Patsorn Sangkloy,Supasorn Suwajanakorn
2023-06-03
Abstract:Our paper seeks to transfer the hairstyle of a reference image to an input photo for virtual hair try-on. We target a variety of challenges scenarios, such as transforming a long hairstyle with bangs to a pixie cut, which requires removing the existing hair and inferring how the forehead would look, or transferring partially visible hair from a hat-wearing person in a different pose. Past solutions leverage StyleGAN for hallucinating any missing parts and producing a seamless face-hair composite through so-called GAN inversion or projection. However, there remains a challenge in controlling the hallucinations to accurately transfer hairstyle and preserve the face shape and identity of the input. To overcome this, we propose a multi-view optimization framework that uses "two different views" of reference composites to semantically guide occluded or ambiguous regions. Our optimization shares information between two poses, which allows us to produce high fidelity and realistic results from incomplete references. Our framework produces high-quality results and outperforms prior work in a user study that consists of significantly more challenging hair transfer scenarios than previously studied. Project page: <a class="link-external link-https" href="https://stylegan-salon.github.io/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to accurately transfer the hairstyle in the reference image onto the input photo while keeping the face shape and identity unchanged in the application of virtual hair - styling. Specifically, the authors focus on the challenges of hairstyle transfer in cases such as different viewing angles, changes in hairstyle length, and partial hair occlusion. For example, the conversion from long hair to short hair, or extracting a partially visible hairstyle from a person wearing a hat and applying it to the target person's face. These problems have not been well - solved in existing hairstyle transfer methods, because these methods often result in a decline in the quality of results when dealing with different viewing angles or large differences in face shapes. To overcome these challenges, the authors propose a multi - view optimization framework, which uses two reference composite images from different viewing angles to semantically guide occluded or blurred regions. By sharing information between the two views, this method can generate high - fidelity and realistic results from incomplete reference images. Specific technical details include: 1. **Multi - view Optimization**: Optimize the guiding images of the two views to achieve more accurate hairstyle transfer and facial detail preservation. 2. **3D Rotation**: Utilize 3D rotation to maintain geometric consistency, thereby better handling the problem of different viewing angles. 3. **Stage - by - stage Optimization**: First, optimize in the W space to generate missing details; then optimize in the W+ space to restore facial and hair details. 4. **Noise Regularization Loss**: Ensure that the noise map is used to capture random changes rather than encoding what should be captured by the latent code. 5. **Latent Similarity Loss**: Force the latent codes of the two views to be close through L2 loss to achieve information sharing. Through these techniques, the authors' method performs well in user studies, especially when dealing with more challenging hairstyle transfer scenarios, such as inconsistent viewing angles and the need for facial or background patching.