Abstract:Synthesizing high-resolution images from intricate, domain-specific information remains a significant challenge in generative modeling, particularly for applications in large-image domains such as digital histopathology and remote sensing. Existing methods face critical limitations: conditional diffusion models in pixel or latent space cannot exceed the resolution on which they were trained without losing fidelity, and computational demands increase significantly for larger image sizes. Patch-based methods offer computational efficiency but fail to capture long-range spatial relationships due to their overreliance on local information. In this paper, we introduce a novel conditional diffusion model in infinite dimensions, $\infty$-Brush for controllable large image synthesis. We propose a cross-attention neural operator to enable conditioning in function space. Our model overcomes the constraints of traditional finite-dimensional diffusion models and patch-based methods, offering scalability and superior capability in preserving global image structures while maintaining fine details. To our best knowledge, $\infty$-Brush is the first conditional diffusion model in function space, that can controllably synthesize images at arbitrary resolutions of up to $4096\times4096$ pixels. The code is available at <a class="link-external link-https" href="https://github.com/cvlab-stonybrook/infinity-brush" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

This paper attempts to solve the key problems encountered in generating high - resolution images, especially in cases where conditional generation based on complex, domain - specific information is required. Specifically, the paper mainly addresses the following challenges: 1. **Limitations of Existing Methods**: - **Finite - Dimensional Diffusion Models**: These models operate in the pixel or latent space and cannot exceed the resolution used during training without losing image quality. - **Patch - Based Methods**: Although computationally efficient, they have difficulty capturing long - range spatial relationships due to over - reliance on local information. 2. **The Need for Large - Image Generation**: - In applications such as digital pathology and remote sensing, very large images (such as 4096×4096 pixels) need to be generated, and existing methods have difficulty meeting this need. To solve these problems, the authors propose a novel conditional diffusion model **∞-Brush**, which operates in an infinite - dimensional function space and can controllably generate large images of any resolution. The following are the main features of this model: - **Cross - Attention Neural Operator**: To achieve conditioning in the function space, the authors introduce a cross - attention neural operator, which can capture fine - grained details while maintaining the global structure. - **Scalability and Efficiency**: By training only a small fraction (about 0.4%) of the image pixels, this model can be trained on very large image datasets and can generate images up to 4096×4096 pixels. - **The First Conditional Diffusion Model in Function Space**: This is the first diffusion model that can perform conditional generation in the function space, thus breaking through the limitations of traditional finite - dimensional models. Overall, this paper aims to solve the limitations of existing generative models in generating high - resolution large images by proposing the **∞-Brush** model, especially in cases where conditional generation based on complex domain information is required.

$\infty$-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions

DiffInfinite: Large Mask-Image Synthesis via Parallel Random Patch Diffusion in Histopathology

DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance

Diffusion Brush: A Latent Diffusion Model-based Editing Tool for AI-generated Images

Novel 3D-Aware Composition Images Synthesis for Object Display with Diffusion Model.

ZoomLDM: Latent Diffusion Model for multi-scale image generation

FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis

Multilevel Diffusion: Infinite Dimensional Score-Based Diffusion Models for Image Generation

One Diffusion to Generate Them All

High-Resolution Image Editing via Multi-Stage Blended Diffusion

Infinite Texture: Text-guided High Resolution Diffusion Texture Synthesis

High-Resolution Image Synthesis with Latent Diffusion Models

DiffSketching: Sketch Control Image Synthesis with Diffusion Models

Image Neural Field Diffusion Models

Diffusion Models with Anisotropic Gaussian Splatting for Image Inpainting

GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis

Boosting Latent Diffusion with Flow Matching

Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes

Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder

Diffusion-based image inpainting with internal learning