CS-VITON: a realistic virtual try-on network based on clothing region alignment and SPM

Jinguang Chen,Xin Zhang,Lili Ma,Bo Yang,Kaibing Zhang
DOI: https://doi.org/10.1007/s00371-024-03347-w
IF: 2.835
2024-03-31
The Visual Computer
Abstract:Image-based virtual try-on involves generating an image of a person wearing a given clothing. Existing virtual try-on works suffer from the problem of misaligned regions between the predicted segmentation map and the deformed clothing, and the generation results of try-on are unnatural. To address this issue, we refine the definition of the misaligned regions and propose a high-resolution virtual try-on network called CS-VITON. The network adopts a two-stage strategy. The first stage is called the condition generator, which predicts the target segmentation map while deforming the clothing into shapes that match the human body. A component that measures the difference between the generated segmentation maps and the mask of deformed clothing is added to the loss function of the deep network. The component is well matched with the tasks of this stage, resulting in more reasonable necklines and skin boundaries. The second stage is called the try-on generator, in which the process of generating try-on images is modulated using residual blocks constructed based on style-preserved modulation. The modulation process takes into account the specific contextual style of the image, which improves the realism of the try-on results. Extensive experiments were conducted on a common high-resolution virtual try-on dataset, demonstrating that that our method yields more realistic virtual try-on results. Metrics such as kernel inception distance also showed some improvement. The code will be available soon at https://github.com/xinz626/CS-VITON.
computer science, software engineering
What problem does this paper attempt to address?