$\infty$-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions

Minh-Quan Le,Alexandros Graikos,Srikar Yellapragada,Rajarsi Gupta,Joel Saltz,Dimitris Samaras
2024-07-20
Abstract:Synthesizing high-resolution images from intricate, domain-specific information remains a significant challenge in generative modeling, particularly for applications in large-image domains such as digital histopathology and remote sensing. Existing methods face critical limitations: conditional diffusion models in pixel or latent space cannot exceed the resolution on which they were trained without losing fidelity, and computational demands increase significantly for larger image sizes. Patch-based methods offer computational efficiency but fail to capture long-range spatial relationships due to their overreliance on local information. In this paper, we introduce a novel conditional diffusion model in infinite dimensions, $\infty$-Brush for controllable large image synthesis. We propose a cross-attention neural operator to enable conditioning in function space. Our model overcomes the constraints of traditional finite-dimensional diffusion models and patch-based methods, offering scalability and superior capability in preserving global image structures while maintaining fine details. To our best knowledge, $\infty$-Brush is the first conditional diffusion model in function space, that can controllably synthesize images at arbitrary resolutions of up to $4096\times4096$ pixels. The code is available at <a class="link-external link-https" href="https://github.com/cvlab-stonybrook/infinity-brush" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the key problems encountered in generating high - resolution images, especially in cases where conditional generation based on complex, domain - specific information is required. Specifically, the paper mainly addresses the following challenges: 1. **Limitations of Existing Methods**: - **Finite - Dimensional Diffusion Models**: These models operate in the pixel or latent space and cannot exceed the resolution used during training without losing image quality. - **Patch - Based Methods**: Although computationally efficient, they have difficulty capturing long - range spatial relationships due to over - reliance on local information. 2. **The Need for Large - Image Generation**: - In applications such as digital pathology and remote sensing, very large images (such as 4096×4096 pixels) need to be generated, and existing methods have difficulty meeting this need. To solve these problems, the authors propose a novel conditional diffusion model **∞-Brush**, which operates in an infinite - dimensional function space and can controllably generate large images of any resolution. The following are the main features of this model: - **Cross - Attention Neural Operator**: To achieve conditioning in the function space, the authors introduce a cross - attention neural operator, which can capture fine - grained details while maintaining the global structure. - **Scalability and Efficiency**: By training only a small fraction (about 0.4%) of the image pixels, this model can be trained on very large image datasets and can generate images up to 4096×4096 pixels. - **The First Conditional Diffusion Model in Function Space**: This is the first diffusion model that can perform conditional generation in the function space, thus breaking through the limitations of traditional finite - dimensional models. Overall, this paper aims to solve the limitations of existing generative models in generating high - resolution large images by proposing the **∞-Brush** model, especially in cases where conditional generation based on complex domain information is required.