Design and development of counting-based visual question answering model using heuristic-based feature selection with deep learning

Tesfayee Meshu Welde,Lejian Liao
DOI: https://doi.org/10.1007/s10462-022-10385-0
IF: 9.588
2023-01-18
Artificial Intelligence Review
Abstract:Visual Question Answering (VQA) is the most significant area that adopts both computer vision techniques and natural language processing techniques. Among all the question types, the most challenging question type is said to be counting, such as "How many?" Still, VQA models consist of certain difficulties in counting the objects that are present in the natural images. The basic technique in the VQA involved either classifying answers according to a definite-length description of both the question and image or estimating summing fractional counts from every image segment. Soft attention in these methods is utilized to find these primary issues. To circumvent this problem, the main intention of this paper is to implement the latest visual question-answering system based on a counting scenario. At first, the standard benchmark datasets related to the visual question-answering system are gathered. This question-answering system dataset is usually incorporated with both images and questions. Hence, feature extraction is adopted for both questions and images. For the questions, the text pre-processing is initially employed by punctuation removal, stemming, and stop word removal and the word2vec features are extracted. Similarly, the deep features of the given images are extracted from the pooling layer of the Deep Convolutional Neural Network (DCNN). These two sets of features are integrated and are fed to the selection of optimal feature procedures for acquiring the most significant features that are giving unique information. The selection of optimal features is handled by the Optimized Deep Neural-Long Short-Term Memory (DN-LSTM). It needs less time and computational complexity and also can be applied to solving all engineering optimization problems. It also can tackle multilevel thresholding problems. These advantages in the Parameter Improved-Elephant Herding Optimization (PI-EHO) over the conventional optimization algorithms seek more attention for choosing the EHO in the designed method. Finally, the answer generation is done by hybrid deep learning with Long Short Term Memory (LSTM) and Deep Neural Network (DNN), for which the architecture is improvised by the proposed EHO. The given designed method is experimented on the different data sets, yielding promising results when compared to existing methods.
computer science, artificial intelligence
What problem does this paper attempt to address?