Ensemble transformer-based multiple instance learning to predict pathological subtypes and tumor mutational burden from histopathological whole slide images of endometrial and colorectal cancer

Ching-Wei Wang,Tzu-Chien Liu,Po-Jen Lai,Hikam Muzakky,Yu-Chi Wang,Mu-Hsien Yu,Chia-Hua Wu,Tai-Kuang Chao
DOI: https://doi.org/10.1016/j.media.2024.103372
Abstract:In endometrial cancer (EC) and colorectal cancer (CRC), in addition to microsatellite instability, tumor mutational burden (TMB) has gradually gained attention as a genomic biomarker that can be used clinically to determine which patients may benefit from immune checkpoint inhibitors. High TMB is characterized by a large number of mutated genes, which encode aberrant tumor neoantigens, and implies a better response to immunotherapy. Hence, a part of EC and CRC patients associated with high TMB may have higher chances to receive immunotherapy. TMB measurement was mainly evaluated by whole-exome sequencing or next-generation sequencing, which was costly and difficult to be widely applied in all clinical cases. Therefore, an effective, efficient, low-cost and easily accessible tool is urgently needed to distinguish the TMB status of EC and CRC patients. In this study, we present a deep learning framework, namely Ensemble Transformer-based Multiple Instance Learning with Self-Supervised Learning Vision Transformer feature encoder (ETMIL-SSLViT), to predict pathological subtype and TMB status directly from the H&E stained whole slide images (WSIs) in EC and CRC patients, which is helpful for both pathological classification and cancer treatment planning. Our framework was evaluated on two different cancer cohorts, including an EC cohort with 918 histopathology WSIs from 529 patients and a CRC cohort with 1495 WSIs from 594 patients from The Cancer Genome Atlas. The experimental results show that the proposed methods achieved excellent performance and outperforming seven state-of-the-art (SOTA) methods in cancer subtype classification and TMB prediction on both cancer datasets. Fisher's exact test further validated that the associations between the predictions of the proposed models and the actual cancer subtype or TMB status are both extremely strong (p<0.001). These promising findings show the potential of our proposed methods to guide personalized treatment decisions by accurately predicting the EC and CRC subtype and the TMB status for effective immunotherapy planning for EC and CRC patients.
What problem does this paper attempt to address?