Diffusion idea exploration for art generation

Nikhil Verma

2023-07-11

Abstract:Cross-Modal learning tasks have picked up pace in recent times. With plethora of applications in diverse areas, generation of novel content using multiple modalities of data has remained a challenging problem. To address the same, various generative modelling techniques have been proposed for specific tasks. Novel and creative image generation is one important aspect for industrial application which could help as an arm for novel content generation. Techniques proposed previously used Generative Adversarial Network(GAN), autoregressive models and Variational Autoencoders (VAE) for accomplishing similar tasks. These approaches are limited in their capability to produce images guided by either text instructions or rough sketch images decreasing the overall performance of image generator. We used state of the art diffusion models to generate creative art by primarily leveraging text with additional support of rough sketches. Diffusion starts with a pattern of random dots and slowly converts that pattern into a design image using the guiding information fed into the model. Diffusion models have recently outperformed other generative models in image generation tasks using cross modal data as guiding information. The initial experiments for this task of novel image generation demonstrated promising qualitative results.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The problem this paper attempts to address is the generation of creative artistic images guided by text and images. Specifically, the authors aim to guide the generation process by combining text descriptions and sketch images to create high-quality artistic designs. This task is particularly important for industrial applications as it can provide new sources of creativity for product design, help designers quickly generate multiple design schemes, and save time and resources. In existing methods, although some techniques such as Generative Adversarial Networks (GANs), autoregressive models, and Variational Autoencoders (VAEs) have been used to generate images, these methods perform limitedly when generating images guided by text or sketch images. Therefore, this study introduces diffusion models (especially stable diffusion models) to overcome the limitations of existing methods and achieve more precise and creative artistic image generation. Through experiments, the authors demonstrate the potential of diffusion models in generating creative artistic images, especially in cases where both text and image guidance are combined.

Diffusion idea exploration for art generation

Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion

Diffusion Cocktail: Mixing Domain-Specific Diffusion Models for Diversified Image Generations

ConceptLab: Creative Concept Generation using VLM-Guided Diffusion Prior Constraints

Diffusion Models Generate Images Like Painters: an Analytical Theory of Outline First, Details Later

DiffusionGPT: LLM-Driven Text-to-Image Generation System

Nested Diffusion Processes for Anytime Image Generation

Diffusion Cocktail: Fused Generation from Diffusion Models

Diverse Diffusion: Enhancing Image Diversity in Text-to-Image Generation

A Survey of Diffusion Based Image Generation Models: Issues and Their Solutions

Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation

Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis

Versatile Diffusion: Text, Images and Variations All in One Diffusion Model

Implementing and Experimenting with Diffusion Models for Text-to-Image Generation

MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models

Generative AI in Vision: A Survey on Models, Metrics and Applications

Draw Your Art Dream: Diverse Digital Art Synthesis with Multimodal Guided Diffusion

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

GlyphDiffusion: Text Generation as Image Generation

Collaborative Diffusion for Multi-Modal Face Generation and Editing