Abstract:Image captioning is a challenging task involving generating descriptive sentences to describe images. The application of semantic concepts to automatically annotate images has made significant progress. However, the now available frameworks have apparent limitations, particularly in concept detection. Incomplete labelling due to biased annotations, using synonyms in training captions, and the enormous gap between positive and negative thought samples contribute to the problem. Incomplete labelling is a result of biased annotations. The captioning frameworks that are now in use are inadequate and create a barrier to accurate image captioning. Unequal sample occurrences and missing training captions negatively affect the model's potential to develop rich and varied descriptions of images. Inadequate sample occurrences and missing training captions also contribute to insufficient idea generation. To circumvent these limitations, a novel approach has been designed to automatically generate images using Weighted Stacked Generative Adversarial Network (WSGAN). With the help of this boost, the uneven distribution of concepts is intended to be rectified, thereby expanding the breadth of the horizons covered by the training set. The proposed approach utilizes a WSGAN in conjunction with a Gated Recurrent Units (GRU)–based Deep Learning (DL) model and a Visual Attention Mechanism (VAM)–based DL model. The purpose of the GRU-VAM model is to enable the generation of text captions for images. To train the model, combining the MS COCO dataset with a wide variety of original and machine-generated image datasets in numerous permutations is necessary. The WSGAN-generated images correct the imbalance and incompleteness in the training dataset, which boosts the model's ability to capture a wider variety of thoughts. During testing and evaluation, the proposed WSGAN- GRU-VAM demonstrates significant enhancements in image captioning metrics compared to existing models. WSGAN-GRU-VAM is superior to other well-known image captioning algorithms such as EnsCaption, Fast RF-UIC, RAGAN, and SAT-GPT-3 in terms of its performance across various essential parameters. Increase in BLEU (8%), METEOR (7%), CIDEr (9%), and ROUGE-L (6%), on average, reflect the model's capacity to provide image captions with enhanced linguistic accuracy, relevance, and coherence.

InDecGAN: Learning to Generate Complex Images from Captions Via Independent Object-Level Decomposition and Enhancement

Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge.

Object-driven Text-to-Image Synthesis via Adversarial Training

Incremental Focal Loss GANs.

Compositional GAN: Learning Image-Conditional Binary Composition

Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis.

R-GAN: Exploring Human-like Way for Reasonable Text-to-Image Synthesis via Generative Adversarial Networks

Compositional GAN: Learning Conditional Image Composition

Controllable Image Synthesis with Attribute-Decomposed GAN

ReverseGAN: An intelligent reverse generative adversarial networks system for complex image captioning generation

Advanced Generative Deep Learning Techniques for Accurate Captioning of Images

ISF-GAN: Imagine, Select, and Fuse with GPT-Based Text Enrichment for Text-to-Image Synthesis

DF-GAN: Deep Fusion Generative Adversarial Networks for Text-to-Image Synthesis.

CF-GAN: cross-domain feature fusion generative adversarial network for text-to-image synthesis

Exploiting Relationship for Complex-scene Image Generation

Interactive Dual Generative Adversarial Networks for Image Captioning

A Framework For Image Synthesis Using Supervised Contrastive Learning

OptGAN: Optimizing and Interpreting the Latent Space of the Conditional Text-to-Image GANs

DAE-GAN: Dynamic Aspect-aware GAN for Text-to-Image Synthesis

Enhancing Image Captioning Using Deep Convolutional Generative Adversarial Networks

IMAGINE: Image Synthesis by Image-Guided Model Inversion