Aligning Knowledge Concepts to Whole Slide Images for Precise Histopathology Image Analysis

Weiqin Zhao,Ziyu Guo,Yinshuang Fan,Yuming Jiang,Maximus Yeung,Lequan Yu
2024-11-27
Abstract:Due to the large size and lack of fine-grained annotation, Whole Slide Images (WSIs) analysis is commonly approached as a Multiple Instance Learning (MIL) problem. However, previous studies only learn from training data, posing a stark contrast to how human clinicians teach each other and reason about histopathologic entities and factors. Here we present a novel knowledge concept-based MIL framework, named ConcepPath to fill this gap. Specifically, ConcepPath utilizes GPT-4 to induce reliable diseasespecific human expert concepts from medical literature, and incorporate them with a group of purely learnable concepts to extract complementary knowledge from training data. In ConcepPath, WSIs are aligned to these linguistic knowledge concepts by utilizing pathology vision-language model as the basic building component. In the application of lung cancer subtyping, breast cancer HER2 scoring, and gastric cancer immunotherapy-sensitive subtyping task, ConcepPath significantly outperformed previous SOTA methods which lack the guidance of human expert knowledge.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key problems in whole - slide image (WSI) analysis: 1. **Large - scale and lack of fine - grained annotation**: The size of WSIs is very large (for example, 150,000 x 150,000 pixels), and they usually lack detailed annotations. This makes it difficult for traditional supervised learning methods to be directly applied to WSIs. 2. **Existing methods rely only on image data**: Most existing computational pathology methods mainly learn from image data, ignoring the knowledge and reasoning methods of human experts. This method is significantly different from how clinicians teach and understand pathological entities and factors. 3. **Limitations of multi - instance learning (MIL) methods**: Although MIL methods can perform weakly - supervised learning under slide - level labels, they perform poorly in handling complex tasks, especially in tasks that require the identification of complex tissue structures and molecular features. 4. **Reliability issues of language - prior generation**: Some studies attempt to use language priors to assist in WSI analysis, but in a fully - trained setting, these methods show unreliable language - prior generation and unsatisfactory performance. To solve these problems, the authors propose a new framework - ConcepPath. This framework improves the accuracy and interpretability of WSI analysis by combining human expert knowledge and new concepts learned from training data. Specifically: - **Introducing human expert knowledge**: ConcepPath uses large - language models (such as GPT - 4) to derive reliable disease - specific human - expert concepts from medical literature and combines them with learnable concepts to extract supplementary knowledge. - **Aligning language and image**: Align WSIs with these language - knowledge concepts through a pathological vision - language model, thereby using expert knowledge more effectively. - **Two - stage concept - guided hierarchical feature aggregation**: ConcepPath adopts a two - stage concept - guided hierarchical feature aggregation paradigm. First, instance features are aggregated into concept - specific bag - level features, and then further aggregated according to the correlation between instance - level concepts and bag - level expert - class prompts. - **Slide adapter**: To address the domain differences between the training data of the pathological vision - language model and downstream WSI analysis tasks, ConcepPath integrates a slide adapter before the final prediction. Through these innovations, ConcepPath significantly outperforms existing state - of - the - art methods in multiple complex WSI analysis tasks, especially in tasks such as lung cancer subtype classification, breast cancer HER2 scoring, and gastric cancer immunotherapy - sensitivity subtype classification.