Path-RAG: Knowledge-Guided Key Region Retrieval for Open-ended Pathology Visual Question Answering

Awais Naeem,Tianhao Li,Huang-Ru Liao,Jiawei Xu,Aby M. Mathew,Zehao Zhu,Zhen Tan,Ajay Kumar Jaiswal,Raffi A. Salibian,Ziniu Hu,Tianlong Chen,Ying Ding
2024-11-26
Abstract:Accurate diagnosis and prognosis assisted by pathology images are essential for cancer treatment selection and planning. Despite the recent trend of adopting deep-learning approaches for analyzing complex pathology images, they fall short as they often overlook the domain-expert understanding of tissue structure and cell composition. In this work, we focus on a challenging Open-ended Pathology VQA (PathVQA-Open) task and propose a novel framework named Path-RAG, which leverages HistoCartography to retrieve relevant domain knowledge from pathology images and significantly improves performance on PathVQA-Open. Admitting the complexity of pathology image analysis, Path-RAG adopts a human-centered AI approach by retrieving domain knowledge using HistoCartography to select the relevant patches from pathology images. Our experiments suggest that domain guidance can significantly boost the accuracy of LLaVA-Med from 38% to 47%, with a notable gain of 28% for H&E-stained pathology images in the PathVQA-Open dataset. For longer-form question and answer pairs, our model consistently achieves significant improvements of 32.5% in ARCH-Open PubMed and 30.6% in ARCH-Open Books on H\&E images. Our code and dataset is available here (<a class="link-external link-https" href="https://github.com/embedded-robotics/path-rag" rel="external noopener nofollow">this https URL</a>).
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the open - ended problems in the Pathology Visual Question Answering (PathVQA) task. Specifically, the author focuses on how to improve the understanding and interpretation ability of complex pathological images, especially the accuracy and efficiency in dealing with open - ended questions. #### Main problems: 1. **Limitations of existing methods**: Existing deep - learning methods often overlook the understanding of tissue structure and cell composition by domain experts when analyzing complex pathological images, resulting in poor performance. 2. **Challenges of open - ended questions**: For open - ended questions (such as describing specific features in a pathological image), existing Visual - Language Models (VLMs) have difficulty in identifying fine - grained visual objects and text entities, especially when dealing with pathological images. 3. **Lack of domain - knowledge guidance**: Traditional deep - learning methods usually do not incorporate professional knowledge in the field of pathology, resulting in deficiencies in the selection and interpretation of key areas. #### Solutions: To solve the above problems, the author proposes a new framework named **Path - RAG**, which uses the **HistoCartography** tool to retrieve relevant domain knowledge from pathological images and select the most relevant image regions (patches). Specifically: - **Knowledge - guided key - region retrieval**: Through the HistoCartography tool, Path - RAG can identify key regions in pathological images, especially regions rich in cell nuclei. - **Combination of multi - modal language models**: Use the LLaVA - Med model to generate descriptions or candidate answers for each key region and pass this information to GPT - 4 for final reasoning. - **Improved open - ended question answering**: By introducing domain knowledge, Path - RAG significantly improves the accuracy of answering open - ended questions, especially for H&E - stained pathological images. #### Experimental results: Experiments show that Path - RAG outperforms existing methods on multiple datasets, especially in dealing with open - ended questions, with a significant improvement in recall rate. For example, on the PathVQA dataset, Path - RAG increases the recall rate of LLaVA - Med from 38% to 47%, and also achieves significant improvement on the ARCH - Open dataset. In conclusion, this paper effectively solves the challenges in answering open - ended questions in pathological images by introducing domain knowledge and combining multi - modal models, improving the accuracy of diagnosis and treatment planning.