SC-MIL: Supervised Contrastive Multiple Instance Learning for Imbalanced Classification in Pathology

Dinkar Juyal,Siddhant Shingi,Syed Ashar Javed,Harshith Padigela,Chintan Shah,Anand Sampat,Archit Khosla,John Abel,Amaro Taylor-Weiner
2023-09-09
Abstract:Multiple Instance learning (MIL) models have been extensively used in pathology to predict biomarkers and risk-stratify patients from gigapixel-sized images. Machine learning problems in medical imaging often deal with rare diseases, making it important for these models to work in a label-imbalanced setting. In pathology images, there is another level of imbalance, where given a positively labeled Whole Slide Image (WSI), only a fraction of pixels within it contribute to the positive label. This compounds the severity of imbalance and makes imbalanced classification in pathology challenging. Furthermore, these imbalances can occur in out-of-distribution (OOD) datasets when the models are deployed in the real-world. We leverage the idea that decoupling feature and classifier learning can lead to improved decision boundaries for label imbalanced datasets. To this end, we investigate the integration of supervised contrastive learning with multiple instance learning (SC-MIL). Specifically, we propose a joint-training MIL framework in the presence of label imbalance that progressively transitions from learning bag-level representations to optimal classifier learning. We perform experiments with different imbalance settings for two well-studied problems in cancer pathology: subtyping of non-small cell lung cancer and subtyping of renal cell carcinoma. SC-MIL provides large and consistent improvements over other techniques on both in-distribution (ID) and OOD held-out sets across multiple imbalanced settings.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the classification challenges in the field of pathology due to the imbalanced distribution of dataset and whole slide image (WSI) level labels. Specifically: 1. **Label Imbalance Issue**: In pathology datasets, the number of samples in different categories may be severely imbalanced. Additionally, in a WSI labeled as positive, only a small portion of pixels or patches contribute to the positive label, which further exacerbates the imbalance issue, making imbalanced classification in pathology extremely challenging. 2. **Model Generalization Ability**: Existing multiple instance learning (MIL) models may suffer from overfitting when dealing with imbalanced data and show significant performance degradation when facing out-of-distribution (OOD) data. Therefore, researchers need to develop a model that can perform well under various imbalanced settings to ensure its reliability and effectiveness in real clinical environments. To address the above issues, the authors propose a new method called SC-MIL, which integrates supervised contrastive learning into the MIL framework. By decoupling feature learning and classifier learning, SC-MIL improves the decision boundary, effectively tackling the label imbalance problem in pathology images. Experimental results show that SC-MIL performs excellently on datasets with various degrees of imbalance and also outperforms other baseline methods on OOD test sets.