Improved Image Caption Generation With GPT-3 And Deep Learning Approach

Surendhar Soundararajan,Vinay Kukreja,Sheshanth Reddy Kallem,Laxman Teja Manchala,Hemanth Kunja,Shanmugasundaram Hariharan
DOI: https://doi.org/10.1109/ICRITO61523.2024.10522314
2024-03-14
Abstract:Subtitling, a growing field of natural language processing (NLP) research, includes generating a caption representation of images. The major goal is to use the pre-trained convolutional neural network (CNN) to develop a mechanism to caption an image. Convolutional neural network helps in feature extraction in this study whereas the extracted features are utilized by Recurrent Neural Network (RNN) to subtitle the captions for images. To encode the image several convolutional neural network pre-trained models are used. The decoder, a GPT -3-based language model, constructs descriptive sentences. In addition, adding Bahdanau's attention model to GPT -3 improves performance by enabling targeted learning in specific image regions. In an empirical evaluation of the MSCOCO dataset, our approach shows competitive performance.
Computer Science
What problem does this paper attempt to address?