Abstract:Accurate diagnosis and prognosis assisted by pathology images are essential for cancer treatment selection and planning. Despite the recent trend of adopting deep-learning approaches for analyzing complex pathology images, they fall short as they often overlook the domain-expert understanding of tissue structure and cell composition. In this work, we focus on a challenging Open-ended Pathology VQA (PathVQA-Open) task and propose a novel framework named Path-RAG, which leverages HistoCartography to retrieve relevant domain knowledge from pathology images and significantly improves performance on PathVQA-Open. Admitting the complexity of pathology image analysis, Path-RAG adopts a human-centered AI approach by retrieving domain knowledge using HistoCartography to select the relevant patches from pathology images. Our experiments suggest that domain guidance can significantly boost the accuracy of LLaVA-Med from 38% to 47%, with a notable gain of 28% for H&E-stained pathology images in the PathVQA-Open dataset. For longer-form question and answer pairs, our model consistently achieves significant improvements of 32.5% in ARCH-Open PubMed and 30.6% in ARCH-Open Books on H\&E images. Our code and dataset is available here (<a class="link-external link-https" href="https://github.com/embedded-robotics/path-rag" rel="external noopener nofollow">this https URL</a>).

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the open - ended problems in the Pathology Visual Question Answering (PathVQA) task. Specifically, the author focuses on how to improve the understanding and interpretation ability of complex pathological images, especially the accuracy and efficiency in dealing with open - ended questions. #### Main problems: 1. **Limitations of existing methods**: Existing deep - learning methods often overlook the understanding of tissue structure and cell composition by domain experts when analyzing complex pathological images, resulting in poor performance. 2. **Challenges of open - ended questions**: For open - ended questions (such as describing specific features in a pathological image), existing Visual - Language Models (VLMs) have difficulty in identifying fine - grained visual objects and text entities, especially when dealing with pathological images. 3. **Lack of domain - knowledge guidance**: Traditional deep - learning methods usually do not incorporate professional knowledge in the field of pathology, resulting in deficiencies in the selection and interpretation of key areas. #### Solutions: To solve the above problems, the author proposes a new framework named **Path - RAG**, which uses the **HistoCartography** tool to retrieve relevant domain knowledge from pathological images and select the most relevant image regions (patches). Specifically: - **Knowledge - guided key - region retrieval**: Through the HistoCartography tool, Path - RAG can identify key regions in pathological images, especially regions rich in cell nuclei. - **Combination of multi - modal language models**: Use the LLaVA - Med model to generate descriptions or candidate answers for each key region and pass this information to GPT - 4 for final reasoning. - **Improved open - ended question answering**: By introducing domain knowledge, Path - RAG significantly improves the accuracy of answering open - ended questions, especially for H&E - stained pathological images. #### Experimental results: Experiments show that Path - RAG outperforms existing methods on multiple datasets, especially in dealing with open - ended questions, with a significant improvement in recall rate. For example, on the PathVQA dataset, Path - RAG increases the recall rate of LLaVA - Med from 38% to 47%, and also achieves significant improvement on the ARCH - Open dataset. In conclusion, this paper effectively solves the challenges in answering open - ended questions in pathological images by introducing domain knowledge and combining multi - modal models, improving the accuracy of diagnosis and treatment planning.

Path-RAG: Knowledge-Guided Key Region Retrieval for Open-ended Pathology Visual Question Answering

Space and Level Cooperation Framework for Pathological Cancer Grading

PathVQA: 30000+ Questions for Medical Visual Question Answering

Pathological Visual Question Answering

PathNarratives: Data Annotation for Pathological Human-Ai Collaborative Diagnosis

Knowledge-enhanced Visual-Language Pretraining for Computational Pathology

PathAsst: Redefining Pathology through Generative Foundation AI Assistant for Pathology

Histopathology language-image representation learning for fine-grained digital pathology cross-modal retrieval

PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding

Robust ROI Detection in Whole Slide Images Guided by Pathologists' Viewing Patterns

Candidate-Heuristic In-Context Learning: A new framework for enhancing medical visual question answering with LLMs

PathTR: Context-Aware Memory Transformer for Tumor Localization in Gigapixel Pathology Images.

HistoGym: A Reinforcement Learning Environment for Histopathological Image Analysis

Prior-Posterior Knowledge Prompting-and-Reasoning for Surgical Visual Question Localized-Answering

RLogist: Fast Observation Strategy on Whole-slide Images with Deep Reinforcement Learning

Rationale-Guided Retrieval Augmented Generation for Medical Question Answering

Open-Set Knowledge-Based Visual Question Answering with Inference Paths

Interpretable medical image Visual Question Answering via multi-modal relationship graph learning

Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering

EasierPath: An Open-source Tool for Human-in-the-loop Deep Learning of Renal Pathology

Histopathology in focus: a review on explainable multi-modal approaches for breast cancer diagnosis