Visual Question Answering based Educational Tool for Medical Students using Cross-ViT

Ms. Soudamini Somvanshi
DOI: https://doi.org/10.22214/ijraset.2024.62677
2024-05-31
International Journal for Research in Applied Science and Engineering Technology
Abstract:Abstract: This paper introduces an advanced approach to medical visual question answering (VQA) using the Cross-ViT architecture. The model employs a dual-branch method to extract multi-scale feature representations from images, utilizing cross-attention mechanisms to enhance visual features. By integrating Stacked Attention Networks (SAN) and leveraging semantic extraction from LSTM for textual data, the model shows significant performance improvements. Experiments on various biomedical VQA tasks demonstrate notable improvements in retrieval accuracy and image-text correlation. The study highlights the potential of medical VQA systems to transform healthcare delivery, improve diagnostic accuracy, and facilitate patient engagement and education, with promising future applications in telemedicine, surgery assistance, and integration with electronic health records
What problem does this paper attempt to address?