Quantitative Evaluation of MILs' Reliability For WSIs Classification

Hassan Keshvarikhojasteh
2024-09-17
Abstract:Reliable models are dependable and provide predictions acceptable given basic domain knowledge. Therefore, it is critical to develop and deploy reliable models, especially for healthcare applications. However, Multiple Instance Learning (MIL) models designed for Whole Slide Images (WSIs) classification in computational pathology are not evaluated in terms of reliability. Hence, in this paper we compare the reliability of MIL models with three suggested metrics and use three region-wise annotated datasets. We find the mean pooling instance (MEAN-POOL-INS) model more reliable than other networks despite its naive architecture design and computation efficiency. The code to reproduce the results is accessible at <a class="link-external link-https" href="https://github.com/tueimage/MILs'R" rel="external noopener nofollow">this https URL</a> .
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to evaluate the reliability of Multiple Instance Learning (MIL) models for Whole Slide Images (WSIs) classification. Specifically, the paper points out that the current MIL models lack quantitative evaluation of reliability in the application of computational pathology. Therefore, the authors propose to use three quantitative indicators to evaluate the reliability of MIL models and conduct experimental verification through three region - annotated datasets. ### Problem Background 1. **Importance of MIL Models**: - MIL is a weakly - supervised classification method and is widely used in computational pathology because obtaining pixel - level annotations of histopathological data is usually very time - consuming. - MIL models classify by creating bags of instances and predicting the labels of these bags. 2. **Importance of Reliability**: - A reliable model can provide acceptable predictions that are in line with basic domain knowledge, which is crucial for medical applications. - In computational pathology, a reliable model should focus on biologically consistent features supported by scientific evidence for prediction. 3. **Limitations of Existing Evaluation Methods**: - At present, most studies only evaluate the reliability of models through qualitative evaluation (such as showing specific slides and their heat maps). - Qualitative evaluation cannot comprehensively cover all slides in the test set and requires pathological knowledge, which is not suitable for machine - learning researchers. ### Main Contributions of the Paper 1. **Proposing Quantitative Evaluation Methods**: - Use three quantitative indicators: Mutual Information (MI), Spearman’s Correlation (rs), and Area Under The Precision - Recall Curve (PR - AUC) to evaluate the reliability of MIL models. 2. **Experimental Verification**: - Use three public WSI datasets (Camelyon16, CATCH, and TCGA BRCA) for experiments to ensure comprehensiveness and diversity of evaluation. 3. **Findings and Conclusions**: - The MEAN - POOL - INS model shows high reliability despite its simple architecture and high computational efficiency. - Multi - head attention models (such as ACMIL and MADMIL) perform well in terms of classification performance and reliability but have high computational costs. - Increase the attention to model reliability and computational cost, and it is recommended to consider these indicators simultaneously when developing new models. Through these works, the authors hope to promote the application of more reliable and lightweight MIL models in WSI classification.