Hierarchical Cross-Attention Network for Virtual Try-On

Hao Tang,Bin Ren,Pingping Wu,Nicu Sebe
2024-11-23
Abstract:In this paper, we present an innovative solution for the challenges of the virtual try-on task: our novel Hierarchical Cross-Attention Network (HCANet). HCANet is crafted with two primary stages: geometric matching and try-on, each playing a crucial role in delivering realistic virtual try-on outcomes. A key feature of HCANet is the incorporation of a novel Hierarchical Cross-Attention (HCA) block into both stages, enabling the effective capture of long-range correlations between individual and clothing modalities. The HCA block enhances the depth and robustness of the network. By adopting a hierarchical approach, it facilitates a nuanced representation of the interaction between the person and clothing, capturing intricate details essential for an authentic virtual try-on experience. Our experiments establish the prowess of HCANet. The results showcase its performance across both quantitative metrics and subjective evaluations of visual realism. HCANet stands out as a state-of-the-art solution, demonstrating its capability to generate virtual try-on results that excel in accuracy and realism. This marks a significant step in advancing virtual try-on technologies.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve several key problems in the virtual try - on task, especially how to generate realistic and visually convincing virtual try - on results. Specifically, the paper points out the following deficiencies in current virtual try - on methods: 1. **Lack of consideration of long - distance global interaction correlation**: Existing virtual try - on methods fail to fully consider the long - distance global interaction correlation between human body representation and clothing representation. This results in inconsistent pairings between clothing and the human body in the generated images. For example, the clothing may appear to be floating on the human body, or the texture may not be properly aligned with the body contour. 2. **Geometry and appearance consistency issues**: Virtual try - on not only requires placing the clothing correctly on the human body, but also needs to ensure the appearance consistency of the clothing in different postures. Existing methods are not satisfactory in this regard. To solve these problems, the paper proposes an innovative solution - the Hierarchical Cross - Attention Network (HCANet). HCANet achieves this goal through two main stages: geometric matching and try - on. ### Geometric Matching Stage - **Purpose**: Accurately align the clothing in the store with the target person. - **Method**: Use a trainable thin - plate spline transformation to achieve precise alignment. ### Try - on Stage - **Purpose**: Use the aligned clothing and person representations to generate pose - consistent images and combined masks. - **Method**: Incorporate the aligned clothing into the person image through the combined mask to ensure the smoothness and visual coherence of the composite image. ### Hierarchical Cross - Attention Block (HCA Block) - **Function**: Capture the long - distance global correlation between person and clothing modalities. - **Structure**: - **First stage**: Enhance the fusion between different person representations and ensure the effective combination of relevant features through the cross - attention mechanism. - **Second stage**: Fuse person and clothing representations, adopt a parallel cross - attention method, establish a hierarchical relationship, and achieve collaborative information fusion. ### Experimental Results - **Quantitative evaluation**: HCANet performs excellently on multiple objective quantitative indicators and outperforms existing methods. - **Qualitative evaluation**: The generated virtual try - on results are also highly recognized in terms of visual realism in subjective evaluations. In conclusion, by introducing the hierarchical cross - attention mechanism, HCANet effectively solves the long - distance global interaction correlation and geometric - appearance consistency problems in the virtual try - on task, significantly improving the realism and accuracy of virtual try - on results.