Abstract 4938: Prediction of gene mutation from colorectal adenocarcinoma whole slide images via integrated deep learning pipeline

Geng-Yun Tien,Liang-Chuan Lai,Tzu-Pin Lu,Mong-Hsun Tsai,Hsiang-Han Chen,Eric Y. Chuang
DOI: https://doi.org/10.1158/1538-7445.am2024-4938
IF: 11.2
2024-03-23
Cancer Research
Abstract:Colorectal cancer (CRC) stands as one of the deadliest cancer types worldwide. Identifying specific gene mutations crucial for targeted therapy is pivotal for CRC treatment. However, the current genetic testing method involves liquid biopsy, which is often cumbersome and cost-intensive. Histopathological whole-slide images (WSIs) are routinely generated for diagnostic purposes and can offer valuable insights into cellular heterogeneity. In this study, we aim to develop an integrated deep learning pipeline for predicting TP53 mutations in CRC, one of the most frequently mutated genes in CRC. We commenced by collecting 424 diagnostic WSIs from the TCGA-COAD cohorts. Each WSI was split as small patches at a resolution of 0.5 micrometers per pixel for following analyses. Quality control was implemented using Otsu thresholding and Gaussian blur filtering, both at the slide-level masking and patch-level filtering stages. Additionally, we applied the Macenko algorithm for color normalization to eliminate systematic errors. Following image preprocessing, we constructed a tissue-type classification model based on VGG19, which was trained and tested using the NCT-CRC-HE-100K and CRC-VAL-HE-7K datasets, both containing pixels with tissue types labeled by human experts. The trained classifier was then used to identify adenocarcinoma epithelium patches from TCGA-COAD. Subsequently, we applied multiple pre-trained Convolutional Neural Network (CNN) models, including ResNet-101, DenseNet-201, and Inception-ResNet-v2, to predict TP53 mutations for the identified adenocarcinoma epithelium patches from TCGA-COAD. A novel integration approach was employed to aggregate patch-level predictions into slide-level outcomes. We evaluated the trained models using various metrics, including accuracy, precision, recall, F1 score, AUC (Area Under the Curve), and Cohen's Kappa coefficient. The tissue-type classification model exhibited robust performance, achieving an overall accuracy of 94.7%, a Cohen's Kappa coefficient of 94%, a precision of 95.3%, a recall of 99.76%, and an F1 score of 97.48% for identifying adenocarcinoma epithelium patches. Regarding the TP53 mutation prediction models, ResNet-101, DenseNet-201, and Inception-ResNet-v2 exhibit the predictive ability to both patch level (AUC=65%) and slide level (AUC=70%). In summary, the proposed deep learning pipeline offers an efficient way to acquire gene mutation information from WSIs. Moreover, it demonstrates a capability to discover potential molecular aberration from the clinical images. Citation Format: Geng-Yun Tien, Liang-Chuan Lai, Tzu-Pin Lu, Mong-Hsun Tsai, Hsiang-Han Chen, Eric Y. Chuang. Prediction of gene mutation from colorectal adenocarcinoma whole slide images via integrated deep learning pipeline [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl) nr 4938.
oncology
What problem does this paper attempt to address?