Abstract:While image generation with diffusion models has achieved a great success, generating images of higher resolution than the training size remains a challenging task due to the high computational cost. Current methods typically perform the entire sampling process at full resolution and process all frequency components simultaneously, contradicting with the inherent coarse-to-fine nature of latent diffusion models and wasting computations on processing premature high-frequency details at early diffusion stages. To address this issue, we introduce an efficient $\textbf{Fre}$quency-aware $\textbf{Ca}$scaded $\textbf{S}$ampling framework, $\textbf{FreCaS}$ in short, for higher-resolution image generation. FreCaS decomposes the sampling process into cascaded stages with gradually increased resolutions, progressively expanding frequency bands and refining the corresponding details. We propose an innovative frequency-aware classifier-free guidance (FA-CFG) strategy to assign different guidance strengths for different frequency components, directing the diffusion model to add new details in the expanded frequency domain of each stage. Additionally, we fuse the cross-attention maps of previous and current stages to avoid synthesizing unfaithful layouts. Experiments demonstrate that FreCaS significantly outperforms state-of-the-art methods in image quality and generation speed. In particular, FreCaS is about 2.86$\times$ and 6.07$\times$ faster than ScaleCrafter and DemoFusion in generating a 2048$\times$2048 image using a pre-trained SDXL model and achieves an FID$_b$ improvement of 11.6 and 3.7, respectively. FreCaS can be easily extended to more complex models such as SD3. The source code of FreCaS can be found at $\href{\text{<a class="link-external link-https" href="https://github.com/xtudbxk/FreCaS" rel="external noopener nofollow">this https URL</a>}}{<a class="link-external link-https" href="https://github.com/xtudbxk/FreCaS" rel="external noopener nofollow">this https URL</a>}$.

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

Optimizing the Quality of Fourier Single-Pixel Imaging Via Generative Adversarial Network

DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling

Deep Fourier-based Arbitrary-scale Super-resolution for Real-time Rendering

ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

Frequency-Domain Refinement with Multiscale Diffusion for Super Resolution

HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models

An Image Arbitrary-Scale Super-Resolution Network Using Frequency-domain Information

MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers

FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model

FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes

FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion

Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation

Patched Denoising Diffusion Models For High-Resolution Image Synthesis

Upsample Guidance: Scale Up Diffusion Models without Training

UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks

One-step Generative Diffusion for Realistic Extreme Image Rescaling