Abstract:Abstract: When humans see an image, their brain can easily tell what the image is about, but a computer cannot do it easily. Computer vision researchers worked on this a lot and they considered it impossible until now! With the advancement in Deep learning techniques, availability of huge datasets and computer power, we can build models that can generate captions for an image. Image Caption Generator is a popular research area of Deep Learning that deals with image understanding and a language description for that image. Generating well-formed sentences requires both syntactic and semantic understanding of the language. Being able to describe the content of an image using accurately formed sentences is a very challenging task, but it could also have a great impact, by helping visually impaired people better understand the content of images. The biggest challenge is most definitely being able to create a description that must capture not only the objects contained in an image, but also express how these objects relate to each other. This paper uses Flickr_8K dataset and Flickr8k_text folder that contains Flickr8k.token which is the main file of our dataset that contains image name and their respective caption separated by newline(“\n”). CNN is used for extracting features from the image. We will use the pre-trained model Xception. LSTM will use the information from CNN to help generate a description of the image. In our Flickr8k_text folder, we have Flickr_8k.trainImages.txt file that contains a list of 6000 images names that we will use for training. After CNN-LSTM model is defined we give an image file as parameter through command prompt for testing image caption generator and it generates the caption of an image and its accuracy is observed by calculating bleu score for generated and reference captions. Keywords: Image Caption Generator, Convolutional Neural Network, Long Short-Term Memory, Bleu score, Flickr_8K

Hyperparameter Analysis for Image Captioning

User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning

Comparative study of Transformer and LSTM Network with attention mechanism on Image Captioning

Image Caption Generator Using Deep Learning

Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering

Pre-Trained CNN Architecture Analysis for Transformer-Based Indonesian Image Caption Generation Model

End-to-End Transformer Based Model for Image Captioning

Enhancing Image Captioning with Neural Models

Synthesis of Vision and Language: Multifaceted Image Captioning Application

Image Captioning In the Transformer Age

Image Captioning using Deep Neural Architectures

Pre-trained CNNs as Feature-Extraction Modules for Image Captioning

A Thorough Review on Recent Deep Learning Methodologies for Image Captioning

Delving Into Precise Attention In Image Captioning

Image Caption Generator Using DenseNet201 and ResNet50

Image Captioning using Deep Stacked LSTMs, Contextual Word Embeddings and Data Augmentation

Transformer with multi-level grid features and depth pooling for image captioning

Deep Learning Approaches on Image Captioning: A Review

A Study of ConvNeXt Architectures for Enhanced Image Captioning

Imageability- and Length-Controllable Image Captioning

Self-Distillation for Few-Shot Image Captioning (Supplementary Materials)