Abstract:Diffusion-based generative models have achieved remarkable success in various domains. It trains a shared model on denoising tasks that encompass different noise levels simultaneously, representing a form of multi-task learning (MTL). However, analyzing and improving diffusion models from an MTL perspective remains under-explored. In particular, MTL can sometimes lead to the well-known phenomenon of negative transfer, which results in the performance degradation of certain tasks due to conflicts between tasks. In this paper, we first aim to analyze diffusion training from an MTL standpoint, presenting two key observations: (O1) the task affinity between denoising tasks diminishes as the gap between noise levels widens, and (O2) negative transfer can arise even in diffusion training. Building upon these observations, we aim to enhance diffusion training by mitigating negative transfer. To achieve this, we propose leveraging existing MTL methods, but the presence of a huge number of denoising tasks makes this computationally expensive to calculate the necessary per-task loss or gradient. To address this challenge, we propose clustering the denoising tasks into small task clusters and applying MTL methods to them. Specifically, based on (O2), we employ interval clustering to enforce temporal proximity among denoising tasks within clusters. We show that interval clustering can be solved using dynamic programming, utilizing signal-to-noise ratio, timestep, and task affinity for clustering objectives. Through this, our approach addresses the issue of negative transfer in diffusion models by allowing for efficient computation of MTL methods. We validate the efficacy of proposed clustering and its integration with MTL methods through various experiments, demonstrating 1) improved generation quality and 2) faster training convergence of diffusion models.

What problem does this paper attempt to address?

The paper attempts to address the issue of negative transfer in the training of diffusion models. Specifically, diffusion models are a multi-task learning (MTL) framework that completes denoising tasks at different noise levels by training a shared model. However, during this multi-task learning process, conflicts may arise between different denoising tasks, leading to a decline in the performance of certain tasks, which is known as negative transfer. The main goal of the paper is to analyze the multi-task learning characteristics of diffusion models and propose methods to mitigate the impact of negative transfer, thereby improving the generation quality and training convergence speed of diffusion models. ### Main Contributions of the Paper: 1. **Analysis of Multi-task Learning Characteristics of Diffusion Models**: - Observed that as the noise level gap increases, the task affinity between denoising tasks gradually decreases. - Found that negative transfer indeed exists in the training of diffusion models, especially when learning all denoising tasks simultaneously within a specific time step interval, leading to performance degradation. 2. **Proposed Methods to Mitigate Negative Transfer**: - Utilized existing multi-task learning techniques, such as handling conflicting gradients, gradient magnitude differences, and loss scale imbalances. - Proposed clustering a large number of denoising tasks into small task clusters and then applying multi-task learning methods on these task clusters to reduce computational overhead. 3. **Design of Specific Clustering Strategies**: - Performed interval clustering based on time steps, signal-to-noise ratio (SNR), and task affinity scores. - Used dynamic programming algorithms to solve the interval clustering problem, ensuring that tasks within each task cluster have high task affinity. 4. **Experimental Validation**: - Conducted extensive experiments on multiple datasets (such as FFHQ, CelebA-HQ, and ImageNet) to validate the effectiveness of the proposed methods. - Experimental results showed that by introducing multi-task learning methods, the quality of generated images and the training convergence speed can be significantly improved. ### Main Observations: - **Task Affinity Analysis**: Denoising tasks at adjacent time steps have high task affinity, while tasks with larger time step gaps have lower affinity. - **Negative Transfer Analysis**: Negative transfer phenomena indeed exist in certain task groups, especially when the time step gap is large, the impact of negative transfer is more significant. ### Method Overview: 1. **Interval Clustering**: - Cluster all denoising tasks based on time steps, signal-to-noise ratio, or task affinity to form small task clusters. - Use dynamic programming algorithms to optimize the clustering process, ensuring high affinity among tasks within each task cluster. 2. **Application of Multi-task Learning Methods**: - Apply multi-task learning methods such as PCgrad, NashMTL, and Uncertainty Weighting on each task cluster. - Handle conflicting gradients, gradient magnitude differences, and loss scale imbalances through projected gradients, Nash bargaining solutions, and task-dependent uncertainty. ### Experimental Results: - **Unconditional Generation**: On the FFHQ and CelebA-HQ datasets, the FID scores significantly decreased after using multi-task learning methods, and the quality and accuracy of generated images improved. - **Conditional Generation**: On the ImageNet dataset, similar effects were observed, with improvements in the quality and diversity of generated images. In summary, the paper significantly improves the performance of diffusion models by analyzing the negative transfer phenomenon and proposing effective mitigation methods.

Addressing Negative Transfer in Diffusion Models

Denoising Task Difficulty-based Curriculum for Training Diffusion Models

DiffusionMTL: Learning Multi-Task Denoising Diffusion Model from Partially Annotated Data

Adaptive Training Meets Progressive Scaling: Elevating Efficiency in Diffusion Models

Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training

Dynamic Negative Guidance of Diffusion Models

Diffusion Tuning: Transferring Diffusion Models via Chain of Forgetting

Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts

Observation-Guided Diffusion Probabilistic Models

Diffusion Model for Generative Image Denoising

Label-Noise Robust Diffusion Models

Training Diffusion Models with Reinforcement Learning

Efficient Diffusion Training Via Min-SNR Weighting Strategy.

Diffusion Model Conditioning on Gaussian Mixture Model and Negative Gaussian Mixture Gradient

Improving Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures

Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy

From Denoising Diffusions to Denoising Markov Models

Efficient Transfer Learning in Diffusion Models via Adversarial Noise

Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment

Transfer Learning for Diffusion Models