Context Diffusion: In-Context Aware Image Generation

Ivona Najdenkoska,Animesh Sinha,Abhimanyu Dubey,Dhruv Mahajan,Vignesh Ramanathan,Filip Radenovic

2023-12-07

Abstract:We propose Context Diffusion, a diffusion-based framework that enables image generation models to learn from visual examples presented in context. Recent work tackles such in-context learning for image generation, where a query image is provided alongside context examples and text prompts. However, the quality and fidelity of the generated images deteriorate when the prompt is not present, demonstrating that these models are unable to truly learn from the visual context. To address this, we propose a novel framework that separates the encoding of the visual context and preserving the structure of the query images. This results in the ability to learn from the visual context and text prompts, but also from either one of them. Furthermore, we enable our model to handle few-shot settings, to effectively address diverse in-context learning scenarios. Our experiments and user study demonstrate that Context Diffusion excels in both in-domain and out-of-domain tasks, resulting in an overall enhancement in image quality and fidelity compared to counterpart models.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve The paper proposes a framework called **Context Diffusion**, aiming to address issues in existing image generation models when utilizing visual context. Specifically, the paper focuses on the following aspects: 1. **Ability to Learn from Visual Context**: - Existing image generation models (such as Prompt Diffusion) cannot effectively utilize visual context for image generation without text prompts. This means these models overly rely on text prompts when handling visual context. - Context Diffusion can generate high-quality images solely through visual context without text prompts. 2. **Support for Multiple Examples**: - The proposed method can support multiple context images as input, thereby enhancing the model's learning ability for different tasks. This allows the model to learn effectively in few-shot scenarios. 3. **Flexibility and Generality**: - Context Diffusion not only performs well in in-domain tasks but also achieves good generalization in out-of-domain tasks. For example, it can handle tasks such as sketch-to-image and editing. 4. **Improved Performance**: - Experimental validation shows that Context Diffusion outperforms existing models (such as Prompt Diffusion) under various conditions (including visual context with text prompts, visual context only, and text prompts only). In summary, the main goal of this paper is to develop a model that can utilize visual context for high-quality image generation under various conditions and demonstrate its superior performance in multiple tasks.

Context Diffusion: In-Context Aware Image Generation

Contextualized Diffusion Models for Text-Guided Image and Video Generation

In-Context Learning Unlocked for Diffusion Models

Improving Diffusion-Based Image Synthesis with Context Prediction

Improving In-Context Learning in Diffusion Models with Visual Context-Modulated Prompts

Cross-Modal Contextualized Diffusion Models for Text-Guided Visual Generation and Editing

Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis

GLoD: Composing Global Contexts and Local Details in Image Generation

Multi-Concept Customization of Text-to-Image Diffusion

One Diffusion to Generate Them All

Prompt-Free Diffusion: Taking "text" out of Text-to-Image Diffusion Models

Nested Diffusion Processes for Anytime Image Generation

RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation

Diffusion idea exploration for art generation

Controlled and Conditional Text to Image Generation with Diffusion Prior

Fast Semantic Diffusion for Large-Scale Context-Based Image and Video Annotation

Diffusion Cocktail: Mixing Domain-Specific Diffusion Models for Diversified Image Generations

SceneGenie: Scene Graph Guided Diffusion Models for Image Synthesis

CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis

Explore In-Context Segmentation via Latent Diffusion Models

Conditional Diffusion on Web-Scale Image Pairs leads to Diverse Image Variations