Fashion Style Editing with Generative Human Prior

Chaerin Kong,Seungyong Lee,Soohyeok Im,Wonsuk Yang
2024-04-02
Abstract:Image editing has been a long-standing challenge in the research community with its far-reaching impact on numerous applications. Recently, text-driven methods started to deliver promising results in domains like human faces, but their applications to more complex domains have been relatively limited. In this work, we explore the task of fashion style editing, where we aim to manipulate the fashion style of human imagery using text descriptions. Specifically, we leverage a generative human prior and achieve fashion style editing by navigating its learned latent space. We first verify that the existing text-driven editing methods fall short for our problem due to their overly simplified guidance signal, and propose two directions to reinforce the guidance: textual augmentation and visual referencing. Combined with our empirical findings on the latent space structure, our Fashion Style Editing framework (FaSE) successfully projects abstract fashion concepts onto human images and introduces exciting new applications to the field.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address the problem of fashion style editing, specifically targeting the manipulation of clothing styles on full-body portrait images using textual descriptions. Existing text-driven methods perform well when dealing with human faces but are relatively limited in their application to more complex domains such as full-body portraits. The researchers propose a framework called Fashion Style Editing (FaSE), which leverages generated human priors to achieve fashion style editing by navigating its learned latent space. Specifically, the study finds that existing methods struggle with this problem due to overly simplified guidance signals. Therefore, the paper proposes two directions to enhance the guidance signals: 1. Text enhancement, by querying language models to generate descriptions of concepts; 2. Visual reference, by retrieving reference images from a constructed fashion database and guiding the model to refer to these images for more descriptive guidance. With the aid of these enhanced visual clarities, the FaSE framework can transform abstract fashion concepts into actual portrait images and is applicable to various use cases.