Visual question answering on blood smear images using convolutional block attention module powered object detection

A. Lubna,Saidalavi Kalady,A. Lijiya
DOI: https://doi.org/10.1007/s00371-024-03359-6
IF: 2.835
2024-04-10
The Visual Computer
Abstract:One of the vital characteristics that determine the health condition of a person is the shape and number of the red blood cells, white blood cells and platelets present in one's blood. Any abnormality in these characteristics is an indication of the person suffering from diseases like anaemia, leukaemia or thrombocytosis. The counting of the blood cell is conventionally made by means of microscopic studies with the application of suitable chemical substances in the blood. The conventional methods pose challenges in the analysis in terms of manual labour and are time-consuming and costly tasks requiring highly skilled medical professionals. This paper proposes a novel scheme to analyse the blood sample of an individual by employing a visual question answering (VQA) system, which accepts a blood smear image as input and answers questions pertaining to the sample, viz. amount of blood cells, nature of abnormalities, etc. very quickly without requiring the service of a skilled medical professional. In VQA, the computer generates textual answers to questions about an input image. Solving this difficult problem requires visual understanding, question comprehension and deductive reasoning. The proposed approach exploits a convolutional neural network for question categorisation and an object detector with an attention mechanism for visual comprehension. The experiment has been conducted with two types of attention: (1) convolutional block attention module and (2) squeeze-and-excitation network which facilitates very fast and reliable results. A VQA dataset has been created for this study due to the unavailability of a public dataset, and the proposed system exhibited an accuracy of 94% for numeric response questions/yes or no type questions and has a BLEU score of 0.91. It is also observed that the attention-based object recognition model of the proposed system for counting the blood characteristics has an accuracy of 97%, 100% and 98% for red blood cell count, white blood cell count and platelet count, respectively, which is an improvement of 1%, 0.06% and 1.61% as compared to the state-of-the-art model.
computer science, software engineering
What problem does this paper attempt to address?