Improving the Conditional Fine-grained Image Generation with Part Perception

Xuan Han,Mingyu You,Ping Lu
DOI: https://doi.org/10.1109/tmm.2023.3326649
IF: 7.3
2024-01-01
IEEE Transactions on Multimedia
Abstract:Synthesizing the images in line with the given condition is a cardinal issue of image generation. The fine-grained conditional image generation, due to its emphasis on the fidelity of details, is of profound worth to the studies in this field. To learn the conditional distribution of data, the discriminating to class semantic of generated samples is necessitated. Though, most existing methods realize it solely based on the condensed global feature, which potentially impedes the model's focus on the detailed local features and in turn causes the inaccuracy or unstable local appearances in generated images. In this context, we propose PartGAN, which features a novel part perception mechanism to strengthen the model's concentration on the nuts-and-bolts of fine-grained objects. In proposed method, the image given to the discriminator will be deconstructed and encoded into a set of embeddings that represent the semantics of parts. This scheme not only assists the model to capture the discriminative local features more accurately, but also prevents the omission of other general local features. Under the effect of the newly designed condition loss term, every part of generated image is equally encouraged to be closer to the corresponding real part, which helps to ensure that the general parts have a stable appearance that conforms to class semantic. The experiments on the popular benchmarks show that the proposed method significantly improves the effect of the generation for fine-grained images.
What problem does this paper attempt to address?