Fast RF-UIC: A Fast Unsupervised Image Captioning Model.

Rui Yang,Xiayu Cui,Qinzhi Qin,Zhenrong Deng,Rushi Lan,Xiaonan Luo
DOI: https://doi.org/10.1016/j.displa.2023.102490
IF: 3.074
2023-01-01
Displays
Abstract:For visually impaired individuals, image captioning is a crucial task that utilizes deep learning models to recognize an image and generate a descriptive sentence, enabling them to understand the content of the image through words. However, the existing image captioning model needs a lot of manual annotation. Fortunately, the emergence of unsupervised methods provides a new approach to image captioning. Our proposed model Fast RF-UIC achieves unsupervised functionality through the designed Pre-trainer. Compared with the existing pre-trained model, the Pre-trainer has a faster and shorter training cycle. The R2-Inception-V4 model is designed as an encoder that fuse the Res2Net structure with Inception-V4 to obtain more image features. Bi-FGRU is designed as the decoder, which the FReLU activation function is used to improve the character representation ability from two-dimensional space. Furthermore, we expanded the corpus used in existing unsupervised image captioning and included additional captions for common objects, effectively enhancing the model’s generalization ability. Through experiments, Fast RF-UIC achieved higher scores than existing unsupervised image captioning methods on several text evaluation metrics such as BLUE, ROUGE, and CIDEr.
What problem does this paper attempt to address?