VisioBlend: Sketch and Stroke-Guided Denoising Diffusion Probabilistic Model for Realistic Image Generation

Harshkumar Devmurari,Gautham Kuckian,Prajjwal Vishwakarma,Krunali Vartak
2024-05-15
Abstract:Generating images from hand-drawings is a crucial and fundamental task in content creation. The translation is challenging due to the infinite possibilities and the diverse expectations of users. However, traditional methods are often limited by the availability of training data. Therefore, VisioBlend, a unified framework supporting three-dimensional control over image synthesis from sketches and strokes based on diffusion models, is proposed. It enables users to decide the level of faithfulness to the input strokes and sketches. VisioBlend achieves state-of-the-art performance in terms of realism and flexibility, enabling various applications in image synthesis from sketches and strokes. It solves the problem of data availability by synthesizing new data points from hand-drawn sketches and strokes, enriching the dataset and enabling more robust and diverse image synthesis. This work showcases the power of diffusion models in image creation, offering a user-friendly and versatile approach for turning artistic visions into reality.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The main problem this paper attempts to address is the challenge of converting hand-drawn sketches and brushstrokes into realistic images. Specifically, traditional methods have the following limitations when generating high-quality images: 1. **Data Availability**: Traditional methods often rely on a large amount of annotated data, which is often difficult to obtain, especially in specific domains or application scenarios. 2. **Flexibility and Controllability**: Existing generative models (such as GANs) require different models for different tasks, lacking flexibility and fine control over the generated images. 3. **Generation Quality**: Existing methods often perform poorly in generating realistic images, especially when dealing with complex and diverse inputs. To address these issues, the paper proposes a unified framework called VisioBlend, which is based on Diffusion Models and can generate high-quality images from sketches and brushstrokes. The main features of VisioBlend include: - **Unified Framework**: Capable of handling both sketches and brushstrokes, simplifying the model architecture and improving efficiency and flexibility. - **Diffusion Models**: Utilizes diffusion models to generate high-quality images by iteratively adding and removing noise to learn the probability distribution of images. - **Flexible Editing**: Users can easily edit the generated images by drawing outlines and colors without the need for complex editing tools. - **High Performance**: Achieves state-of-the-art levels in image synthesis quality and stability, validated through quantitative metrics (such as FID and LPIPS) and qualitative user studies. - **Wide Applications**: Supports various applications such as multi-domain translation, multi-condition local editing, and region-sensitive brushstroke-to-image generation. VisioBlend addresses the issues of data scarcity and low generation quality by introducing new data generation methods and improved model architecture, providing new solutions for content creation and image generation.