TAGE: Trustworthy Attribute Group Editing for Stable Few-shot Image Generation

Ruicheng Zhang,Guoheng Huang,Yejing Huo,Xiaochen Yuan,Zhizhen Zhou,Xuhang Chen,Guo Zhong
2024-10-23
Abstract:Generative Adversarial Networks (GANs) have emerged as a prominent research focus for image editing tasks, leveraging the powerful image generation capabilities of the GAN framework to produce remarkable <a class="link-external link-http" href="http://results.However" rel="external noopener nofollow">this http URL</a>, prevailing approaches are contingent upon extensive training datasets and explicit supervision, presenting a significant challenge in manipulating the diverse attributes of new image classes with limited sample availability. To surmount this hurdle, we introduce TAGE, an innovative image generation network comprising three integral modules: the Codebook Learning Module (CLM), the Code Prediction Module (CPM) and the Prompt-driven Semantic Module (PSM). The CPM module delves into the semantic dimensions of category-agnostic attributes, encapsulating them within a discrete codebook. This module is predicated on the concept that images are assemblages of attributes, and thus, by editing these category-independent attributes, it is theoretically possible to generate images from unseen categories. Subsequently, the CPM module facilitates naturalistic image editing by predicting indices of category-independent attribute vectors within the codebook. Additionally, the PSM module generates semantic cues that are seamlessly integrated into the Transformer architecture of the CPM, enhancing the model's comprehension of the targeted attributes for editing. With these semantic cues, the model can generate images that accentuate desired attributes more prominently while maintaining the integrity of the original category, even with a limited number of samples. We have conducted extensive experiments utilizing the Animal Faces, Flowers, and VGGFaces datasets. The results of these experiments demonstrate that our proposed method not only achieves superior performance but also exhibits a high degree of stability when compared to other few-shot image generation techniques.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the case of a limited number of samples, how to stably edit and generate high - quality images of new categories. Specifically, the existing few - shot image generation methods rely on a large amount of training data and explicit supervision, and it is difficult to handle the diverse attribute editing of new categories. To solve this problem, the authors proposed TAGE (Trustworthy Attribute Group Editing for Stable Few - shot Image Generation), an innovative image - generation network. ### Core of the problem 1. **Limitations of existing methods**: - Existing few - shot image generation methods are mainly divided into three categories: optimization - based methods, fusion - based methods, and transformation - based methods. These methods have the following problems: - **Optimization - based methods**: Although a general model can be trained through meta - learning, it performs poorly in capturing the details of unseen categories, resulting in less realistic generated images. - **Fusion - based methods**: They require a high similarity between input images and have a high computational cost, which limits their application range. - **Transformation - based methods**: The process of learning and applying intra - class variations is complex and unstable, leading to training difficulties. 2. **Challenges of editing methods**: - Editing methods generate images by editing attributes, but lack the effective ability to edit the attributes of unseen categories. For example, the AGE (Attribute Group Editing) method may cause organs to disappear or the image to collapse when generating images, affecting the image quality. ### TAGE's solutions TAGE solves the above problems by introducing three key modules: 1. **Codebook Learning Module (CLM)**: - Use unlabeled image recognition semantic directions to construct a discrete codebook to recombine known attributes to generate unseen - category images. - Limit the latent space and store high - quality reconstruction elements, thereby improving the image quality. 2. **Code Prediction Module (CPM)**: - Predict latent codes to ensure accurate attribute editing under conditions of limited data or high diversity. - Use global combination information and long - range dependencies to predict codes and improve the diversity of generated images. 3. **Prompt - driven Semantic Module (PSM)**: - Generate semantic prompts to guide the CPM to perform fine - grained attribute operations while maintaining consistency. - Inject semantically - guided prompts into the Transformer layer to achieve high - quality image generation and editing. ### Experimental verification The paper conducted extensive experiments on three datasets, Animal Faces, Flowers, and VGGFaces, and the results show that TAGE not only has superior performance but also shows higher stability in few - shot image generation tasks. ### Summary TAGE aims to solve the problems of stability and high - quality generation in few - shot image generation. By introducing a discrete codebook, a code prediction module, and a semantically - driven prompt module, it realizes the efficient editing and generation of unseen - category images.