Candidate-Heuristic In-Context Learning: A new framework for enhancing medical visual question answering with LLMs
Xiao Liang,Di Wang,Haodi Zhong,Quan Wang,Ronghan Li,Rui Jia,Bo Wan
DOI: https://doi.org/10.1016/j.ipm.2024.103805
IF: 7.466
2024-06-23
Information Processing & Management
Abstract:Medical Visual Question Answering (MedVQA) is designed to answer natural language questions related to medical images. Existing methods largely adopting the cross-modal pre-training and fine-tuning paradigm, face limitations in accuracy due to data scarcity and insufficient incorporation of extensive medical knowledge. Drawing inspiration from the Knowledge-Based Visual Question Answering (KB-VQA) domain, which leverages Large Language Models (LLMs) and external knowledge bases, we introduce the C andidate- H euristic I n- C ontext L earning (CH-ICL) framework, a novel approach that leverages LLMs augmented with external knowledge to directly enhance existing MedVQA models. Specifically, we collect a pathology terminology dictionary from a public digital pathology library as an external knowledge base and use it to train a knowledge scope discriminator, which helps identify the knowledge scope required to answer a question. Then, we employ existing MedVQA models to provide reliable answer candidates along with their confidence scores. Finally, the knowledge scope and candidates, combined with retrieved in-context exemplars, are aggregated into prompts for heuristically guiding LLMs in answer generation. Experimental results on the PathVQA, VQA-RAD, and SLAKE public benchmarks show state-of-the-art performance, with improvements of 1.91%, 1.88%, and 2.17% respectively over the baseline. Code and dataset are available at https://github.com/ecoxial2007/CH-ICL .
computer science, information systems,information science & library science