Abstract 4970: Multi-modal machine learning approaches for predicting cancer type and Gleason grade leveraging public TCGA data
Christian Wohlfart,Eldad Klaiman,Jacub Witkowski,Michael King,Jacob Gildenblat,Ofir Etz-Hadar,Mohammad Ashtari,Antoaneta Vladimirova
DOI: https://doi.org/10.1158/1538-7445.am2024-4970
IF: 11.2
2024-03-31
Cancer Research
Abstract:Introduction: To better understand the complex and challenging nature of diseases such as cancer and for improved diagnosis, it may require the combination of multiple data modalities, such as histopathological images and omics data such as RNA-seq. By integrating these heterogeneous but complementary data, a multimodal approach unites both worlds and could achieve better synergistic results compared to using a single modality. The growing availability of large datasets such as The Cancer Genome Atlas (TCGA) with more than 10000 patients made it possible to combine different modalities to train machine learning algorithms which offers great potential to address challenging cancer related research. In this proof of concept initiative we use machine learning approaches within an open-source framework in order to leverage the potential of multimodality (Histopathology Whole Slide Images (WSI) and Genomics/RNA-seq) to build predictive AI models for cancer type and prostate Gleason score, and provide a potential to develop a quality control step. Method: We used matched WSI and RNA-Seq profiles from TCGA, including 11093 samples and 30 cancer types to develop a pancancer classification model using both modalities. For prostate Gleason score prediction 401 patients were available. Both datasets were split into a train (70%) and test (30%) components. We used a late fusion approach where we combined the RNA-seq model (linear SVM) with the WSI model (Resnet18) by multiplying the probability scores of each single-modality model. Model performance was measured with the F1 metric. Results: For cancer type prediction, the multimodality model achieved an F1 score of 0.95 on the test set. About 40% of the cancer types benefited from a synergistic effect by combining the two modalities. Cancer types and percent increase in F1 scores, respectively, that benefit most by combining modalities are: Cervical squamous cell carcinoma and endocervical adenocarcinoma (4.23%), Cholangiosarcoma (6.66%) and Uterine carcinosarcoma (4%). Interestingly, in other cancer types the combination did not result in improved predictive scores compared to a single modality model, e.g. in Rectum adenocarcinoma, Sarcoma or Stomach adenocarcinoma. For Prostate cancer grading, Gleason score prediction of patterns 3/4/5, combined multi modality model earned 0.73 F1 outperforming the single modality models. Conclusion: By combining histopathology imaging and omics modalities we demonstrated synergistic effects in predictive power for both cancer-related research questions. We show improved predictive performance in 40% of the classified cancer types by taking both modalities. Imaging or omics modalities alone can be sufficient in some cases and their strengths are very problem-specific. Citation Format: Christian Wohlfart, Eldad Klaiman, Jacub Witkowski, Michael King, Jacob Gildenblat, Ofir Etz-Hadar, Mohammad Ashtari, Antoaneta Vladimirova. Multi-modal machine learning approaches for predicting cancer type and Gleason grade leveraging public TCGA data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular s); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl) nr 4970.
oncology