Prior-Posterior Knowledge Prompting-and-Reasoning for Surgical Visual Question Localized-Answering

Xin Yang,Peixi Peng,Dongsheng Zhou,Wenfei Liu,Wanshu Fan
DOI: https://doi.org/10.1109/IJCNN60899.2024.10650493
2024-06-30
Abstract:The Surgical Visual Question Localized-Answering (VQLA) aims to locate the specific instance area while responing the associated question, which has the potential to assist junior resident doctors in understanding the surgical process and offering decision support for surgeons. Yet, this task remains a challenging job for data-driven neural networks, due to the serious reliance on surgical scene information provided by posterior knowledge. Hence, we propose a prior-posterior knowledge prompting-and-Reasoning (PPKPR) method to imitate the operational mode of a surgeon. Surgeons systematically inspect each instance within the surgical scene and its associated question, subsequently relying on their accumulated work experience to understand and answer the questions. The PPKPR comprises three modules: prior-posterior multi-domain knowledge prompter (PPMP), prior-posterior instance knowledge prompter (PPIP), and posterior knowledge Reasoner (PKR). Specifically, PPMP aligns prior-posterior multi-domain knowledge, thus prompting model to alleviate the misinterpretations of the textual question. PPIP provides the prior instance knowledge, ensuring model to focus on correct areas in the visual scene. The prompted knowledge is refined by the PKR to reasoning the final answer. Experimental results demonstrate that our method performs favorably against the state-of-the-art methods on the EndoVis-18 and EndoVis-17 datasets.
Computer Science,Medicine
What problem does this paper attempt to address?