Automatic image captioning system based on augmentation and ranking mechanism

B. S. Revathi,A. Meena Kowshalya
DOI: https://doi.org/10.1007/s11760-023-02725-6
2023-09-04
Abstract:Research on automatically producing syntactically and semantically accurate captions is still an open challenge. This paper proposes an effective pretrained Augmentation–Ranking (A–R) Image Captioning model. The proposed model improves the properties of the images and produces appropriate captions. The employed novel augmentation strategy improves convolution neural network (CNN) operation, while Ranking and Feedback Propagation improve Long Short-Term Memory (LSTM). Our proposed model seeks to address the issues of complexity, vanishing gradients and context during training. The proposed A–R model improves the performance of LSTM and CNN. The image dataset for training is expanded using the augmented CNN. Through ranks, the Ranking LSTM aids in the identification of the semantic captions. This blending method enhances the working of image captioning system. Utilizing greedy and beam search, the proposed A–R model is examined under maximum and average pooling. The outcomes are compared with cutting-edge models such as the bidirectional recurrent neural network, Google NIC and Bi-LSTM combined with semantic attention mechanism. The proposed model is assessed using the Flickr 8 k and Flickr 30 k dataset and assessed using measures including BLEU, METEOR and CIDER. The proposed model with reduced complexity generated captions deemed accurate, syntactically correct and semantically correct by achieving an accuracy of 74.87% above all baseline models, according to experimental results.
engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?