FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes

Ziying Pan,Kun Wang,Gang Li,Feihong He,Yongxuan Lai

2024-06-04

Abstract:The class-conditional image generation based on diffusion models is renowned for generating high-quality and diverse images. However, most prior efforts focus on generating images for general categories, e.g., 1000 classes in ImageNet-1k. A more challenging task, large-scale fine-grained image generation, remains the boundary to explore. In this work, we present a parameter-efficient strategy, called FineDiffusion, to fine-tune large pre-trained diffusion models scaling to large-scale fine-grained image generation with 10,000 categories. FineDiffusion significantly accelerates training and reduces storage overhead by only fine-tuning tiered class embedder, bias terms, and normalization layers' parameters. To further improve the image generation quality of fine-grained categories, we propose a novel sampling method for fine-grained image generation, which utilizes superclass-conditioned guidance, specifically tailored for fine-grained categories, to replace the conventional classifier-free guidance sampling. Compared to full fine-tuning, FineDiffusion achieves a remarkable 1.56x training speed-up and requires storing merely 1.77% of the total model parameters, while achieving state-of-the-art FID of 9.776 on image generation of 10,000 classes. Extensive qualitative and quantitative experiments demonstrate the superiority of our method compared to other parameter-efficient fine-tuning methods. The code and more generated results are available at our project website: <a class="link-external link-https" href="https://finediffusion.github.io/" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The paper aims to address the issue of large-scale fine-grained image generation. Specifically, existing class-conditional image generation methods based on diffusion models mainly focus on generating images of general categories (such as the 1000 categories in ImageNet-1k). However, there are still significant challenges for the task of large-scale fine-grained image generation, which includes 10,000 categories. These challenges include: 1. **High computational resource demand**: Training large-scale fine-grained image generation models from scratch requires a substantial amount of computational resources and training time. 2. **Large storage overhead**: Fully fine-tuning pre-trained models leads to enormous storage overhead. 3. **Poor generation quality**: Existing methods struggle to capture subtle differences when generating fine-grained category images, resulting in insufficient quality and diversity of the generated images. To address this, the authors propose a parameter-efficient fine-tuning strategy called FineDiffusion. By fine-tuning only the parameters of the TieredEmbedder, bias terms, and normalization layers, they significantly accelerate the training process and reduce storage overhead. Additionally, they introduce a new sampling method that utilizes superclass-conditioned guidance to improve the quality of fine-grained category image generation.

FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes

Elucidating The Design Space of Classifier-Guided Diffusion Generation

Gradient-Free Classifier Guidance for Diffusion Model Sampling

Relational Diffusion Distillation for Efficient Image Generation

DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance

Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion

FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models

Do We Need to Design Specific Diffusion Models for Different Tasks? Try ONE-PIC

Diffusion Models Need Visual Priors for Image Generation

Efficient Dataset Distillation via Minimax Diffusion

Plug-and-Play Diffusion Distillation

ReDiFine: Reusable Diffusion Finetuning for Mitigating Degradation in the Chain of Diffusion

One Diffusion to Generate Them All

AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration

DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image Generation using Limited Data

Cascaded Diffusion Models for High Fidelity Image Generation

Incremental Image Generation with Diffusion Models by Label Embedding Initialization and Fusion

Diffusion Cocktail: Mixing Domain-Specific Diffusion Models for Diversified Image Generations

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

Elucidating the Design Space of Diffusion-Based Generative Models

Simple and Fast Distillation of Diffusion Models