Abstract:Whole slide image (WSI) analysis is gaining prominence within the medical imaging field. Recent advances in pathology foundation models have shown the potential to extract powerful feature representations from WSIs for downstream tasks. However, these foundation models are usually designed for general-purpose pathology image analysis and may not be optimal for specific downstream tasks or cancer types. In this work, we present Concept Anchor-guided Task-specific Feature Enhancement (CATE), an adaptable paradigm that can boost the expressivity and discriminativeness of pathology foundation models for specific downstream tasks. Based on a set of task-specific concepts derived from the pathology vision-language model with expert-designed prompts, we introduce two interconnected modules to dynamically calibrate the generic image features extracted by foundation models for certain tasks or cancer types. Specifically, we design a Concept-guided Information Bottleneck module to enhance task-relevant characteristics by maximizing the mutual information between image features and concept anchors while suppressing superfluous information. Moreover, a Concept-Feature Interference module is proposed to utilize the similarity between calibrated features and concept anchors to further generate discriminative task-specific features. The extensive experiments on public WSI datasets demonstrate that CATE significantly enhances the performance and generalizability of MIL models. Additionally, heatmap and umap visualization results also reveal the effectiveness and interpretability of CATE. The source code is available at <a class="link-external link-https" href="https://github.com/HKU-MedAI/CATE" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of poor performance of basic pathology models in specific downstream tasks or cancer - type analysis. Specifically, although existing basic pathology models can extract powerful feature representations from whole slide images (WSI), these models are usually designed for general pathological image analysis and are not optimized for specific tasks or cancer types. Therefore, the features they extract may contain information irrelevant to the task, which will impair the performance of specific downstream tasks. To overcome this challenge, the authors propose a new method named **Concept Anchor - guided Task - specific Feature Enhancement (CATE)**. CATE enhances the basic pathology model in the following ways: 1. **Concept - guided Information Bottleneck (CIB)**: This module enhances task - related features by maximizing the mutual information between image features and concept anchors while minimizing redundant information. 2. **Concept - Feature Interference (CFI)**: This module uses the similarity between calibrated image features and concept anchors to generate more discriminative task - specific features. Through these two modules, CATE can improve the expressive and discriminative abilities of existing basic pathology models on specific tasks without adding additional supervision or significant computational resources, thereby improving the performance and generalization ability of multi - instance learning (MIL) models. ### Formula summary - **Overall objective function of CATE**: \[ L = L_{CE}+\lambda_P L_{PIM}+\lambda_S L_{SIM} \] where: - \(L_{CE}\) is the cross - entropy loss for downstream tasks. - \(L_{PIM}\) is the predicted information maximization loss. - \(L_{SIM}\) is the redundant information minimization loss. - \(\lambda_P\) and \(\lambda_S\) are hyperparameters. - **Predicted information maximization (PIM) loss**: \[ L_{PIM}=E_{\hat{\alpha}, ccs^{pos}}\left[-k\sum_{i = 1}\frac{\hat{\alpha}_i^T ccs^{pos}}{\tau}\right]+E_{\hat{\alpha}, ccs^{pos}}\left[k\sum_{i = 1}\log\left(\sum_{j = 1}^m\exp\left(\frac{\hat{\alpha}_i^T ccs_j}{\tau}\right)+\sum_{j = 1}^n\exp\left(\frac{\hat{\alpha}_i^T cca_j}{\tau}\right)\right)\right] \] - **Redundant information minimization (SIM) loss**: \[ L_{SIM}=E\left[\sum_{i = 1}^k D_{KL}(q_\theta(\alpha_i|x_i)\|r(\alpha_i))\right] \] Through these formulas and modules, CATE effectively enhances the discriminative ability of the original features and aligns them with task - specific concept anchors to improve prediction performance.

Free Lunch in Pathology Foundation Model: Task-specific Model Adaptation with Concept-Guided Feature Enhancement

Text-guided Foundation Model Adaptation for Pathological Image Classification

A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model

An efficient framework based on large foundation model for cervical cytopathology whole slide image screening

PathoDuet: Foundation Models for Pathological Slide Analysis of H&E and IHC Stains

Multimodal Whole Slide Foundation Model for Pathology

PathoTune: Adapting Visual Foundation Model to Pathological Specialists

Prompt-Guided Adaptive Model Transformation for Whole Slide Image Classification

MGCT: Mutual-Guided Cross-Modality Transformer for Survival Outcome Prediction using Integrative Histopathology-Genomic Features

CAMP: Continuous and Adaptive Learning Model in Pathology

Multimodal Cross-Task Interaction for Survival Analysis in Whole Slide Pathological Images

Towards a general-purpose foundation model for computational pathology

Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images

Aligning Knowledge Concepts to Whole Slide Images for Precise Histopathology Image Analysis

A foundation model for generalizable cancer diagnosis and survival prediction from histopathological images

Tissue Concepts: supervised foundation models in computational pathology

Text-guided Foundation Model Adaptation for Long-Tailed Medical Image Classification

Low-resource finetuning of foundation models beats state-of-the-art in histopathology

Benchmarking foundation models as feature extractors for weakly-supervised computational pathology

Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective

SHAP-CAT: A interpretable multi-modal framework enhancing WSI classification via virtual staining and shapley-value-based multimodal fusion