Image Caption Generator Using DenseNet201 and ResNet50

,Vidhi Khubchandani
DOI: https://doi.org/10.18178/ijfcc.2024.13.3.618
2024-01-01
International Journal of Future Computer and Communication
Abstract:Image Caption generation is an important research area in computer vision and natural language processing. This paper compares two popular Convolutional Neural Network (CNN) architectures, DenseNet201 and ResNet50, for feature extraction in the title generation task. The study aims to analyze the impact of these architectures on the quality of generated subtitles by measuring their learning curves and Bilingual Evaluation Understudy (BLEU) scores. The study shows that the choice of CNN architecture significantly affects the performance of the captioning model. Densenet201 and Resnet50 have different learning models and BLEU scores, indicating that the former is more effective at capturing high-level features, while the latter is more suitable for capturing local features. This study’s results will help develop more accurate and efficient subtitling models.
What problem does this paper attempt to address?