Model Semantic Attention (SemAtt) With Hybrid Learning Separable Neural Network and Long Short-Term Memory to Generate Caption

Agus Nursikuwagus,Rinaldi Munir,Masayu L. Khodra,Deshinta Arrova Dewi
DOI: https://doi.org/10.1109/access.2024.3481499
IF: 3.9
2024-10-29
IEEE Access
Abstract:Image captioning is a hot topic that combines a multidiscipline task between computer vision and natural language processing. One of the tasks in the geological field is to make descriptions from the images of geological rocks. The task of a geologist is to write a content description of an image and display it as text that can be used in the future. Interpretation of the object is one of the objectives of the research, which is to traverse the image structures in depth. Shapes, colors, and structures are to be focused on to get the image's features. The problem faced is how the separable neural network (SNN) and long short-term memory (LSTM) have an impact on the caption that can meet the geologist's description. SNN is called Visual Attention (VaT), and LSTM is called Semantic Attention (SemAtt) as an architecture of image captioning. The result of the experiment confirms that the accuracy model for captioning gets BLEU- , BLEU- , BLEU- , and BLEU- . The evaluation score is compared to those of other evaluators, such as Meteor and RougeL, which get 0.670 and 0.623, respectively. The model confirms that it outperforms the baseline model. Referring to the evaluations, we concluded that the model was able to generate captioned geological rock images that met the geologist's description. Precision and recall have supported the models in providing the predicted word that is suitable for the image features.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?