Explainable AI for computational pathology identifies model limitations and tissue biomarkers

Jakub R. Kaczmarzyk,Joel H. Saltz,Peter K. Koo
2024-09-05
Abstract:Deep learning models have shown promise in histopathology image analysis, but their opaque decision-making process poses challenges in high-risk medical scenarios. Here we introduce HIPPO, an explainable AI method that interrogates attention-based multiple instance learning (ABMIL) models in computational pathology by generating counterfactual examples through tissue patch modifications in whole slide images. Applying HIPPO to ABMIL models trained to detect breast cancer metastasis reveals that they may overlook small tumors and can be misled by non-tumor tissue, while attention maps$\unicode{x2014}$widely used for interpretation$\unicode{x2014}$often highlight regions that do not directly influence predictions. By interpreting ABMIL models trained on a prognostic prediction task, HIPPO identified tissue areas with stronger prognostic effects than high-attention regions, which sometimes showed counterintuitive influences on risk scores. These findings demonstrate HIPPO's capacity for comprehensive model evaluation, bias detection, and quantitative hypothesis testing. HIPPO greatly expands the capabilities of explainable AI tools to assess the trustworthy and reliable development, deployment, and regulation of weakly-supervised models in computational pathology.
Tissues and Organs
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in computational pathology, although deep - learning models have shown potential in histopathological image analysis, their opaque decision - making processes pose challenges in high - risk medical scenarios. Specifically, the Attention - based Multiple Instance Learning (ABMIL) models are difficult to interpret because the attention maps of these models do not always directly reflect the impact on the prediction results. In addition, existing post - hoc explanation methods such as LIME and SHAP may not accurately reflect the decision - making process of the model when assuming the additive or linear influence of individual pixels. To solve these problems, the paper introduces HIPPO (Histopathology Interventions of Patches for Predictive Outcomes), an explainable artificial intelligence method that evaluates ABMIL models by generating counterfactual examples through modifying tissue patches in whole - slide images. The main goals of HIPPO are: 1. **Evaluate the limitations and biases of the model**: By generating counterfactual examples, HIPPO can reveal problems where the model may overlook small tumors or be misled by non - tumor tissues when detecting breast cancer metastases. 2. **Identify tissue biomarkers**: HIPPO can identify tissue areas that have a stronger impact on prognosis prediction, not just those high - attention areas. 3. **Provide a more comprehensive model evaluation**: HIPPO can not only evaluate the performance of the model, but also detect the model's biases and conduct quantitative hypothesis testing. 4. **Enhance the credibility and reliability of the model**: By understanding the decision - making process of the model more deeply, HIPPO helps to improve the credibility and reliability of the model in clinical applications. The paper demonstrates the practical value of HIPPO in computational pathology by applying it to two main tasks - metastasis detection and cancer prognosis prediction. Specific applications include: - **Metastasis detection**: Using the CAMELYON16 dataset to evaluate the performance of five base models in breast cancer metastasis detection, revealing model - specific limitations and biases. - **Cancer prognosis prediction**: Applying HIPPO to the ABMIL model, identifying tissue areas with stronger correlations to prognosis, and finding that high - attention areas can sometimes have counter - intuitive effects on risk scores. In summary, by introducing HIPPO, this paper aims to solve the interpretability and credibility problems of deep - learning models in computational pathology, providing a new method for developing more reliable and clinically relevant AI tools.