Diffusion Model with Perceptual Loss

Shanchuan Lin,Xiao Yang

2024-03-07

Abstract:Diffusion models trained with mean squared error loss tend to generate unrealistic samples. Current state-of-the-art models rely on classifier-free guidance to improve sample quality, yet its surprising effectiveness is not fully understood. In this paper, we show that the effectiveness of classifier-free guidance partly originates from it being a form of implicit perceptual guidance. As a result, we can directly incorporate perceptual loss in diffusion training to improve sample quality. Since the score matching objective used in diffusion training strongly resembles the denoising autoencoder objective used in unsupervised training of perceptual networks, the diffusion model itself is a perceptual network and can be used to generate meaningful perceptual loss. We propose a novel self-perceptual objective that results in diffusion models capable of generating more realistic samples. For conditional generation, our method only improves sample quality without entanglement with the conditional input and therefore does not sacrifice sample diversity. Our method can also improve sample quality for unconditional generation, which was not possible with classifier-free guidance before.

Computer Vision and Pattern Recognition,Artificial Intelligence,Machine Learning

What problem does this paper attempt to address?

This paper focuses on the problem of generating unrealistic samples in image generation using diffusion models. The current state-of-the-art models rely on Classifier-Free Guidance (CFG) to improve sample quality, but the understanding of its effectiveness is still limited. The authors found that the effectiveness of CFG partially stems from its implicit perceptual guidance. Therefore, they propose to directly apply perceptual loss in diffusion training to improve sample quality. By using the diffusion model itself as a perceptual network, they propose the Self-Perceptual Objective, which allows the model to generate more realistic samples without sacrificing sample diversity in conditional generation. Furthermore, their method also improves the sample quality in unconditional generation, which was previously not achievable with CFG. The paper also conducts ablation studies to explore the effects of different hyperparameters, time steps, distance functions, and model structures on the results, and compares them with CFG. Although the Self-Perceptual Objective is not superior to CFG in some aspects, it provides a new quality improvement method that is independent of conditional inputs.

Diffusion Model with Perceptual Loss

Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training

Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models

DiffLoss: unleashing diffusion model as constraint for training image restoration network

Diffusion Model for Generative Image Denoising

Unlearnable Examples for Diffusion Models: Protect Data from Unauthorized Exploitation

Improving Sample Quality of Diffusion Models Using Self-Attention Guidance

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

Boosting Latent Diffusion with Perceptual Objectives

Unmasking Bias in Diffusion Model Training

HumanDiffusion: diffusion model using perceptual gradients

Guiding a Diffusion Model with a Bad Version of Itself

Diffusion Model Based Visual Compensation Guidance and Visual Difference Analysis for No-Reference Image Quality Assessment

Training Diffusion Models with Reinforcement Learning

Entropy-Driven Sampling and Training Scheme for Conditional Diffusion Generation.

Diffusion Models With Learned Adaptive Noise

Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion

Enhancing Sample Generation of Diffusion Models using Noise Level Correction

BudgetFusion: Perceptually-Guided Adaptive Diffusion Models

Understanding and Improving Training-free Loss-based Diffusion Guidance