Abstract:One of the trending areas of study in artificial intelligence is image captioning. Image captioning is a process of creating descriptive information for visual objects, image metadata, or entities present in an image. It extracts features from the image using the integration of computer vision and Natural Language Processing (NLP), uses this data to identify objects, actions, and the relationships among them, and creates image descriptions. It is not only an extremely important but also a very difficult task in computer vision research. A lot of work on image captioning methods that utilize a deep learning approach has been conducted. The goal of this article is to discover, evaluate, and summarize the works that examine deep learning applications in the context of image captioning systems. We found 548 papers using a systematic literature review (SLR) technique, of which 38 were identified as primary studies and so underwent in-depth analysis. This review's result demonstrates that LSTM, CNN, and RNN are mostly employ deep learning techniques for image captioning. Also, the most popular used datasets based on the selected primary studies are MS COCO Dataset, Flickr8k, and Flickr30k. These are standardized benchmark datasets being employed by researchers to compare their methods on common test-beds. The review also showed that the evaluation methods such as BLEU, CIDEr, SPICE, METEOR, and ROUGE-L are the most often employed ones according to the findings from this SMR study. Despite the considerable advancements achieved by deep learning approaches in this study domain, there is always a potential for improvement. Finally, the review provided future research for image captioning systems. We believe that this SLR will act as a reference for other scientists and an inspiration to gather the most recent data for their study evaluation.

Domain-specific image captioning: a comprehensive review

A comprehensive survey on image captioning: from handcrafted to deep learning-based techniques, a taxonomy and open research issues

A comprehensive review of image caption generation

Visuals to Text: A Comprehensive Review on Automatic Image Captioning

Deep Learning Approaches on Image Captioning: A Review

A Comprehensive Analysis of Real-World Image Captioning and Scene Identification

A Thorough Review on Recent Deep Learning Methodologies for Image Captioning

A comprehensive literature review on image captioning methods and metrics based on deep learning technique

Deep learning and knowledge graph for image/video captioning: A review of datasets, evaluation metrics, and methods

Image captioning using deep learning and python

A Comprehensive Survey of Deep Learning for Image Captioning

A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions

An Overview of Image Caption Generation Methods

Pixels to Prose: Understanding the art of Image Captioning

Video captioning – a survey

From vision to text: A comprehensive review of natural image captioning in medical diagnosis and radiology report generation

A Survey of Medical Image Captioning Technique: Encoding, Decoding and Latest Advance

Image Captioning using Deep Neural Architectures

From Show to Tell: A Survey on Deep Learning-based Image Captioning

Supervised Deep Learning Techniques for Image Description: A Systematic Review

A Survey on Biomedical Image Captioning