Abstract:Diffusion models have become a prevalent framework in deep generative modeling across various modalities. However, despite producing high quality results, these models are computationally expensive and suffer from slow convergence. In this work, we address these challenges in image generation by leveraging the wavelet domain, which decomposes images into low and high-frequency components, each at half the resolution of the original image in both height and width. We observe that prioritizing the learning of low-frequency components over high-frequency details and masking out unnecessary high-frequency content in wavelet space can significantly enhance training convergence and reduce computational demands. This strategy simplifies the complexity associated with high-frequency details during training, allowing the model to capture the most representative features of the data distribution while maintaining a balance in detail preservation. To facilitate controlled learning across different wavelet coefficients, we employ a multitask loss function, with each task corresponding to the learning of a distinct wavelet subband. Additionally, to ensure consistency among wavelet coefficients, which is crucial for accurate reconstruction in pixel space, we introduce a multispectral cross-attention mechanism to aid the joint generation of different wavelet coefficients. The sampling process involves jointly generating wavelet coefficients, followed by an inverse wavelet transform to convert them back to pixel space. Our approach not only improves the training efficiency for unconditional image generation compared with the standard denoising diffusion probabilistic model (vanilla DDPM) but also uniquely supports the generation of high-frequency content conditioned on a low-resolution image, enabling both image generation and upsampling within a single model. To our knowledge, this capability is novel. Our model demonstrates superior performance in image generation compared with baseline models on the STL-10 dataset, as evidenced by improved Frećhet inception distance (FID) and recall scores.

Diffusion model with disentangled modulations for sharpening multispectral and hyperspectral images

DDRF: Denoising Diffusion Model for Remote Sensing Image Fusion

RGB Images Enhancing Hyperspectral Image Denoising with Diffusion Model

A Noise-Model-Free Hyperspectral Image Denoising Method Based on Diffusion Model.

SSDiff: Spatial-spectral Integrated Diffusion Model for Remote Sensing Pansharpening

MultiSpectral diffusion: joint generation of wavelet coefficients for image synthesis and upsampling

DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion

Hyperspectral and Multispectral Image Fusion Using the Conditional Denoising Diffusion Probabilistic Model

Stimulating Diffusion Model for Image Denoising via Adaptive Embedding and Ensembling

Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models

Diff-IF: Multi-modality image fusion via diffusion model with fusion knowledge prior

SpectralDiff: A Generative Framework for Hyperspectral Image Classification with Diffusion Models

Image Denoising Via Multiscale Nonlinear Diffusion Models.

Multi-Domain Multi-Scale Diffusion Model for Low-Light Image Enhancement

Hierarchical Integration Diffusion Model for Realistic Image Deblurring

Image Denoising via Multi-scale Nonlinear Diffusion Models

Semantic information guided diffusion posterior sampling for remote sensing image fusion

PDDM: Prior-Guided Dual-Branch Diffusion Model for Pansharpening

Conditional Diffusion Models for Weakly Supervised Medical Image Segmentation

P-MSDiff: Parallel Multi-Scale Diffusion for Remote Sensing Image Segmentation