Comparative Analysis of Generative Models: Enhancing Image Synthesis with VAEs, GANs, and Stable Diffusion

Sanchayan Vivekananthan

2024-08-16

Abstract:This paper examines three major generative modelling frameworks: Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Stable Diffusion models. VAEs are effective at learning latent representations but frequently yield blurry results. GANs can generate realistic images but face issues such as mode collapse. Stable Diffusion models, while producing high-quality images with strong semantic coherence, are demanding in terms of computational resources. Additionally, the paper explores how incorporating Grounding DINO and Grounded SAM with Stable Diffusion improves image accuracy by utilising sophisticated segmentation and inpainting techniques. The analysis guides on selecting suitable models for various applications and highlights areas for further research.

Computer Vision and Pattern Recognition,Image and Video Processing

What problem does this paper attempt to address?

The paper aims to address various issues in generative models for image synthesis and explores how to improve the performance of these models by combining different techniques. Specifically, the paper compares three main generative modeling frameworks: Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), and Stable Diffusion models. Each model has its advantages and limitations: 1. **Variational Autoencoders (VAEs)**: - Advantages: Capable of effectively learning latent representations, suitable for learning complex probability distributions. - Limitations: Generated images have blurry edges, and there is a posterior collapse issue. 2. **Generative Adversarial Networks (GANs)**: - Advantages: Can generate high-quality, realistic images. - Limitations: Training is unstable, prone to mode collapse, and requires high computational resources. 3. **Stable Diffusion models**: - Advantages: Generate high-resolution, detail-rich images while maintaining semantic consistency. - Limitations: The inference process is time-consuming and requires high computational resources. Additionally, the paper explores methods to combine Grounding DINO and Grounded SAM with Stable Diffusion to improve the accuracy of image segmentation and object detection, thereby enhancing the effectiveness of image synthesis. While this approach improves image quality and consistency, it also increases computational complexity and the risk of overfitting. Through these analyses, the paper aims to guide researchers and practitioners in selecting the most suitable generative model architecture for their specific needs and points out directions for future research.

Comparative Analysis of Generative Models: Enhancing Image Synthesis with VAEs, GANs, and Stable Diffusion

P‐2.9: A review of image generation methods based on deep learning

BEGAN v3: Avoiding Mode Collapse in GANs Using Variational Inference

A survey of generative models used in text-to-image

A Bayesian Non-parametric Approach to Generative Models: Integrating Variational Autoencoder and Generative Adversarial Networks using Wasserstein and Maximum Mean Discrepancy

MVFSIGM: multi-variant feature-based synthesis image generation model for improved stability using generative adversarial network

Comparing Generative Adversarial Network Techniques for Image Creation and Modification

Generic image application using GANs (Generative Adversarial Networks): A Review

A Survey of Data-Driven 2D Diffusion Models for Generating Images from Text

Systematic Analysis of Image Generation using GANs

A Survey of Modern Deep Learning based Generative Adversarial Networks (GANs)

Generative adversarial networks (GANs): Introduction, Taxonomy, Variants, Limitations, and Applications

A Systematic Review on Generative Adversarial Network (GAN): Challenges and Future Directions

Investigation related to application of Generative Adversarial Networks in text-to-image synthesis

Survey on Generative Adversarial Behavior in Artificial Neural Tasks

A brief study of generative adversarial networks and their applications in image synthesis

Recent Advances of Generative Adversarial Networks in Computer Vision

Handwritten Digits Image Generation with help of Generative Adversarial Network: Machine Learning Approach

DiffuseVAE: Efficient, Controllable and High-Fidelity Generation from Low-Dimensional Latents

A Survey on Generative Adversarial Networks: Variants, Applications, and Training

Semantic Image Synthesis Via Diffusion Models