Abstract:Most GAN-based methods utilize semantic layouts as input for generating realistic images. However, these layouts primarily consist of object contours and often lack detailed information, leading to suboptimal image quality in the generated outputs. To address this limitation, we propose a novel GAN architecture called LMCGAN designed specifically for synthesizing high-quality images. LMCGAN introduces a generator network structured around the laplacian pyramid, enabling the simultaneous generation of multi-scale feature maps.This approach allows the model to capture finer details at different resolutions, enhancing the overall realism of the generated images.To further improve the utilization of semantic maps, we integrate a multi-scale channel attention (MSCA) mechanism.This mechanism effectively focuses on channel-specific information in complex scenes, which is crucial for preserving essential details that may otherwise be lost. During the feature fusion phase, we implement a feature fusion block (FFBL) that is designed to capture important relationships across various scales. This block facilitates the integration of information from different resolutions, ensuring that the final output retains critical features. Additionally, we adopt a combination of conditional and unconditional methods to reduce noise during the training process, leading to more stable and effective training dynamics. Extensive experiments conducted on challenging datasets demonstrate that LMCGAN significantly outperforms existing methods in terms of both visual quality and quantitative evaluation metrics. The results indicate that our architecture not only generates more realistic images but also excels in preserving intricate details, marking a substantial advancement in the field of image synthesis using GANs.

Text-to-Image Generation with Multiscale Semantic Context-Aware Generative Adversarial Networks

Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge.

SIMGAN: Photo-Realistic Semantic Image Manipulation Using Generative Adversarial Networks.

Multi-scale Dual-Modal Generative Adversarial Networks for Text-to-image Synthesis

Text to Image Synthesis with Multi-Granularity Feature Aware Enhancement Generative Adversarial Networks

Cross-Modal Semantic Matching Generative Adversarial Networks for Text-to-Image Synthesis

Multi-Sentence Complementarily Generation for Text-to-Image Synthesis

Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis.

DMF-GAN: Deep Multimodal Fusion Generative Adversarial Networks for Text-to-Image Synthesis

Language-vision Matching for Text-to-image Synthesis with Context-Aware GAN

CD-GAN: Commonsense-Driven Generative Adversarial Network with Hierarchical Refinement for Text-to-Image Synthesis

R-GAN: Exploring Human-like Way for Reasonable Text-to-Image Synthesis via Generative Adversarial Networks

Multi-Semantic Fusion Generative Adversarial Network for Text-to-Image Generation

R-GAN: Exploring Human-likeWay for Reasonable Text-to-Image Synthesis Via Generative Adversarial Networks

MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis

GAN for Semantic Image Synthesis With Laplacian Pyramid and Multi-Scale Channel Attention

Multi-Sentence Auxiliary Adversarial Networks for Fine-Grained Text-to-Image Synthesis

DSE-GAN: Dynamic Semantic Evolution Generative Adversarial Network for Text-to-Image Generation

DM-GAN: Dynamic Memory Generative Adversarial Networks for Text-to-Image Synthesis

Text-to-image Generation Based on Spatial-Channel Attention and Semantic Redescription

Self-Modulated Feature Fusion GAN for Text-to-Image Synthesis